Madelon Hulsebos

I’m a researcher at CWI where I lead the Table Representation Learning (TRL) Lab. I’m also faculty and MT member of the ELLIS unit Amsterdam, and Member of the Scientific Advisory Board of Prior Labs. Previously, I did a postdoc at UC Berkeley and obtained my PhD from the University of Amsterdam for which I spent time at MIT and Sigma Computing. Before academia, I spent 2+ years in industry working on ML-driven automated data science tools. Find my full CV here.

My research is focused on table representation learning (TRL), generative models and systems for tabular data. Tables are prevalent in the data landscape, contain valuable data, and fuel important decisions in organizations in governments, industry, and healthcare. My objective, therefore, is to democratize insights from tabular data ✨. My research has been supported by an NWO AiNed grant, BIDS-Accenture Fellowship, and industry sponsors. You can find an overview of my research interests below.

To establish tabular data as a key modality for AI, akin to images and text, I’ve been driving TRL initatives since 2021. In particular, I founded the Table Representation Learning workshop series at NeurIPS and ACL, established the TRL research theme at ELLIS Amsterdam, and organize related efforts (♥ the tabular AI community). I’m reviewing for various tracks/workshops at e.g. NeurIPS, ICLR, VLDB, and SIGMOD.

Some research interests and contributions in TRL in line with my vision for democratizing insights from structured data:

Tabular reasoning: TARGET table retrieval benchmark, LLMs for table QA, question ambiguity in tabular data analysis.
Table semantics: semantic type detection in tabular data, e.g. Sherlock, contextual sensitive data detection.
Dataset search: vision and insights on dataset search systems, proactive task-based dataset search DataScout.
Table embeddings & retrieval: Observatory on relational properties in table embeddings, DCTR retriever for complex queries.
Corpora for tabular AI: GitTables: 1M+ real-world tables, SQaLe: 500K+ text-to-SQL triples, SchemaPile: 22K+ DB schemas.

Other topics I am working on are: ML-powered tabular predictive insights, generative models for tabular data, and beyond.