About meI'm a postdoctoral fellow at UC Berkeley with the EPIC Lab at EECS / BIDS. Prior to joining UCB, I did my PhD research at the University of Amsterdam and partly at Sigma Computing and the MIT Media Lab.
In my opinion, tables are a promising modality for representation and generative learning with too much application potential to ignore. Therefore, I research Table Representation Learning (abbrev. TRL; I'm curating an index of recent research on TRL) and applications in data management and analysis. Broadly, my objective is to make insight retrieval from structured data a walk in the park for everyone. Some of my research in this area can be found below.
To stimulate research on TRL, I founded the Table Representation Learning workshop (@NeurIPS). I also co-chair the Data Management for E2E ML workshop and the SemTab challenge. I review for various tracks/workshops at e.g. VLDB, SIGMOD, EDBT, NeurIPS, ICML, ICLR, WWW. Besides academia, I've been member of the supervisory board of a student consulting firm and was a data scientist for 2+ years, working on tools for automated data analysis and ML pipelines.
The projects below reflect my main research interest. But I enjoy working on other topics too. Check my profile on Google Scholar for my full publication record.
1) Framework & tool for anazlying table embeddings based on the relational model and data distributions.
2) Library for extracting table embeddings on row- column-, cell-level (TRL workshop @ NeurIPS).
1) analysis paper | 2) library paper | code
Corpus of 1.7M relational tables extracted from GitHub CSVs. Columns annotated w/ semantic types.
paper | website | dataset | code | video presentation | slides | podcast
A dataset of approximately 50K real-world database schemas extracted from SQL files from GitHub.
paper | code/dataset
Adaptive semantic column type detection system focusing on productization in industry contexts.
paper | video presentation
DL method for semantic data type detection of table columns (top-5 MIT Media Lab repos, 2 Aug 23).
paper | website | code
Corpus of over 31 million datasets from open data repositories, for benchmarking visualization studies.
paper | website
Nov 27, 2023
Pleased to have started a new position as postdoctoral scholar at the University of California, Berkeley. I’m looking forward to starting some interesting new projects, and collaborating with the great EPIC Lab!
Sep 12, 2023
Grateful to have received the Best Reviewer Award at the VLDB 2023 PhD workshop! Hearing that my reviews are considered valuable means a lot to me.
Aug 04, 2023
Jul 20, 2023
Had a nice chat on the Disseminate Podcast w/ Jack about the thoughts and processes behind GitTables, and the potential of learned table representations. Listen to the podcast here, thanks for hosting me Jack!