Welcome to DataLab
We are a team of research scientists committed to tackling the fundamental challenges and open questions regarding data for AI. We are committed to bridging the gap between research theory and data deployment to push the frontier forward.
Research

What Should Agentic AI Actually Be Trained to Do?
Knowledge, Tasks, and What Data Bottlenecks Exist
Engy Ziedan, Ph.D.

Signal-Grounded Quality Control for Large-Scale Speech Corpora
Principles for quality control in large-scale speech datasets
Rey Pocius, M.S.

Preserving Clinical Reality in International Healthcare Data
Capturing the heterogeneity of international healthcare without introducing systematic selection bias
Allison Fox, M.S.
Latest News
Evaluating Medical AI: What Social Science Already Knows
Five Random Control Trials from Social Science that offer valuable lessons in evaluating AI's effects on health
Clinical Documentation & Medical Billing - New Benchmarks for Healthcare
Medical documentation and medical coding sit at the center of healthcare’s administrative burden. We prepare the data behind the latest benchmarks.
Introducing Protege Evaluation Datasets and Benchmarks for Healthcare AI
Real world datasets that match true patient journeys for unbiased healthcare AI model evaluation.