Madelon Hulsebos
I’m a faculty at CWI where I lead the Table Representation Learning (TRL) Lab. I’m also faculty in the ELLIS unit Amsterdam. Previously, I did a postdoc at UC Berkeley and obtained my PhD from the University of Amsterdam during which I spent time at MIT and Sigma Computing. Before academia, I spent 2+ years in industry working on ML-driven automated data analysis tools. Find my full CV here.
My research is focused on table representation learning (TRL), generative models and systems for tabular data. Tables are prevalent in the data landscape, contain valuable data, and fuel important decisions in organizations in governments, industry, and healthcare. My objective, therefore, is to democratize insights from structured data ✨. My research has been supported by an NWO AiNed grant, BIDS-Accenture Fellowship, and industry sponsors. You can find an overview of my research interests below.
To establish tabular data as a key modality for AI, akin to images and text, I’ve been driving TRL initatives since 2021. In particular, I founded the Table Representation Learning workshop series at NeurIPS and ACL, established the TRL research theme at ELLIS Amsterdam, and organize related efforts (❤ the tabular AI community). I’m reviewing for various tracks/workshops at e.g. NeurIPS, ICLR, VLDB, and SIGMOD.
Some research interests and contributions in TRL in line with my vision for democratizing insights from structured data:
- Tabular reasoning: LLMs for table QA, TARGET table retrieval benchmark, NL query ambiguity in tabular data analysis.
- Dataset search: vision and insights on dataset search systems, proactive task-based dataset search DataScout.
- Table semantics: semantic type detection in tabular data, e.g. Sherlock, contextual sensitive data detection.
- Table embeddings: Observatory on relational properties in table embeddings, role of embedding metadata in table retrieval.
- Corpora for tabular AI: GitTables: 1M+ real-world tables, SQaLe: 500K+ text-to-SQL triples, SchemaPile: 22K+ DB schemas.
Other topics I am working on are: ML-powered tabular predictive insights, generative models for tabular data, and beyond.