About me

I'm a postdoctoral fellow at UC Berkeley, and incoming faculty at CWI Amsterdam. Prior to joining UCB, I did my PhD at the University of Amsterdam with research at Sigma Computing and the MIT Media Lab.

In my opinion, tables are a promising modality for representation and generative learning with too much application potential to ignore. Therefore, I research Table Representation Learning (check out this index of recent research on TRL) and applications in data management and analysis. Broadly, my objective is to make insight retrieval from structured data a walk in the park for everyone. Some of my research in this area can be found below.

To stimulate research on TRL, I founded the Table Representation Learning workshop (@NeurIPS). Within the wider research community, I co-organize the Data Management for E2E ML workshop at SIGMOD and the Table-to-KG matching (SemTab) challenge. I review for various tracks/workshops at e.g. VLDB, SIGMOD, NeurIPS, ICLR, WWW. Before starting in academia, I was a data scientist for 2+ years, working on automating ML-driven analyses.

Interested in a PhD on neural models for structured data in beautiful Amsterdam? Apply here by 21 July 2024 for the current PhD opening starting in fall 2024! Not ready to start something new? I'll open 4-5 PhD and 1 Postdoc positions in total starting across 2024-2026, leave your info here if you are interested.
I'm also seeking strong industry partners for collaboration, please reach out if your team is interested!

👉 Read more in my CV.

Selected projects

The projects below reflect my main research interest. But I enjoy working on other topics too. Check my profile on Google Scholar for my full publication record.

Dataset Search [HILDA@SIGMOD, 2024]
1) Survey results surfacing why, what, and how is searched for data, key open challenges, and system desiderata.
2) System (tbc).
1) paper survey

SchemaPile [SIGMOD 2024]
A dataset of approximately 221K real-world database schemas extracted from SQL files from GitHub.
paper | dataset | code

Observatory [PVLDB, TRL@NeurIPS, 2023]
1) Framework for analyzing table embeddings based on the relational model, and desiderata for TRL models.
2) Library for extracting table embeddings on row- column-, cell-level.
1) analysis paper | 2) library paper | code

GitTables [SIGMOD, 2023]
Corpus of 1.7M relational tables extracted from GitHub CSVs. Columns annotated w/ semantic types.
paper | website | dataset | code | video presentation | slides | podcast

AdaTyper [CIDR, 2022]
Adaptive semantic column type detection system focusing on productization in industry contexts.
paper | video presentation

Sherlock [KDD, 2019]
DL method for semantic data type detection of table columns (top-5 MIT Media Lab repos, 2 Aug 23).
paper | website | code

VizNet [CHI, 2019]
Corpus of over 31 million datasets from open data repositories, for benchmarking visualization studies.
paper | website

Recent news

  • Attended SIGMOD!

    Jun 20, 2024

    Attended SIGMOD in Chile! It was fun and busy, Santiago was lovely. Co-organized the DEEM workshop and presented the survey on dataset search in practice at HILDA. Generally, discussions and talks about retrieval, (structured) data semantics, and vector databases had my particular interest this year.

  • Awarded AiNed Fellowship Grant funding 5-year research project at CWI

    Mar 20, 2024

    Thrilled to share that I’m awarded the AiNed Fellowship Grant (worth $1M) to lead the 5-year DataLibra research project at CWI in Amsterdam starting fall 2024. DataLibra is focused on democratizing insight retrieval from structured data through representation and generative learning over tables.

  • Postdoctoral Fellowship at Berkeley

    Nov 27, 2023

    Pleased to have started a new position as postdoctoral scholar at the University of California, Berkeley. I’m looking forward to starting some interesting new projects, and collaborating with the great EPIC Lab!

  • Best Reviewer Award PhD Workshop VLDB 2023

    Sep 12, 2023

    Grateful to have received the Best Reviewer Award at the VLDB 2023 PhD workshop! Hearing that my reviews are considered valuable means a lot to me.

  • Upcoming talks

    Aug 04, 2023

    Excited to be invited to talk about Transformers for Tables at the Transformers at Work (15 Sep 2023), and about GitTables at the TaDA workshop (remote) at VLDB (1 Sep 2023). Very welcome to join!