Hi! I'm a PhD candidate with the INDE Lab at the University of Amsterdam. So far, my research has been on learned table representations and their applications like data preparation, exploration, and analysis. My broader interest is in Intelligent Data Systems with a particular focus on systems for relational tables.

My interest in 'learning from tables' started at the MIT Media Lab, where I developed Sherlock, a deep learning method for detecting table semantics at scale, enabling applications like data validation. The huge interest from industry in Sherlock and the dominant presence of tables across data systems, inspired me to start a PhD in 2020 to focus on learned table models and their effectiveness in practice. A piece of this puzzle is GitTables: a dataset of 1.7M tables (but continuously growing) extracted from CSV files on GitHub and enriched with table semantics such as semantic column types.

In my opinion, tables have been a too long ignored modality for representation learning with too much application potential to ignore. Therefore, I founded the Table Representation Learning workshop (hosted at NeurIPS 2022). As part of the wider research community, I support JSys as Assistant Editor, co-organize DEEM (SIGMOD 2023) and the SemTab challenge (2021/2022), and review for various workshops/tracks at e.g. VLDB, EDBT, NeurIPS, WWW. Besides academia, I am member of the supervisory board of a student consulting firm and was a data scientist for 2+ years, working on automating ML-driven analyses.
You can read more in my CV.

Feel welcome to reach out (click on any channel in the footer)!

Selected projects

The projects below are close to my main research interest. But I enjoy working on other topics too. Check my profile on Google Scholar for my full publication record.

GitTables [Hulsebos et al. (to appear), SIGMOD, 2023]
Corpus of 1.7M relational tables extracted from GitHub CSVs. Columns annotated w/ semantic types.
paper | website | dataset | code | video presentation | slides

GitSchemas [Döhmen et al., DBML@ICDE, 2022]
A dataset of approximately 50K real-world database schemas extracted from SQL files from GitHub.
paper | code/dataset

AdaTyper [Hulsebos et al. (abstract), CIDR, 2022]
Adaptive semantic column type detection system focusing on productization in industry contexts.
paper | video presentation

Sherlock [Hulsebos et al., KDD, 2019]
DL method for semantic data type detection of table columns (top-10 MIT Media Lab repos, 3/10/22).
paper | website | code

VizNet [Hu et al., CHI, 2019]
Corpus of over 31 million datasets from open data repositories, for benchmarking visualization studies.
paper | website

Recent news

  • Excited to talk about Tables at BTW and HPI!

    Mar 05, 2023

    Excited to attend the workshop on ML for Systems and Systems for ML at BTW 2023 and visit the Information Systems group at Hasso-Plattner Institute. At both occassions, I’ll give a talk about (learning representations of) tables. See the slides of my talk here.

  • Tutorial about Neural Table Representations accepted at SIGMOD '23.

    Feb 04, 2023

    I’m excited to give a (3-hour!) tutorial at SIGMOD 2023 about Models and Practice of Neural Table Representations together with Xiang Deng, Huan Sun, and Paolo Papotti. This tutorial will give an overview of the field and hands-on session.

  • Co-organizing the DEEM workshop @ SIGMOD 2023.

    Dec 15, 2022

    I’m co-organizing a workshop on Data Management for End-to-End Machine Learning (DEEM) at SIGMOD 2023. Read more here, and I hope to see you in Seattle soon!

  • Reflection on the TRL workshop @ NeurIPS 2022.

    Dec 13, 2022

    This year I co-organized the workshop on Table Representation Learning (TRL) at NeurIPS 2022. The workshop was a great success and received much interest from various communities (NLP/ML/DB) which illustrates the importance and impact of TRL.

  • Co-organizing Table Representation Learning workshop @ NeurIPS 2022.

    Jul 14, 2022

    I’m co-organizing a workshop on Table Representation Learning (TRL) at NeurIPS 2022. Read more here, and I hope to see you in New Orleans very soon!