Welcome to DataLab

We are a team of research scientists committed to tackling the fundamental challenges and open questions regarding data for AI. We are committed to bridging the gap between research theory and data deployment to push the frontier forward.

Learn More Join Us

Research

What Should Agentic AI Actually Be Trained to Do?

Knowledge, Tasks, and What Data Bottlenecks Exist

Engy Ziedan, Ph.D.

Signal-Grounded Quality Control for Large-Scale Speech Corpora

Principles for quality control in large-scale speech datasets

Rey Pocius, M.S.

Preserving Clinical Reality in International Healthcare Data

Capturing the heterogeneity of international healthcare without introducing systematic selection bias

Allison Fox, M.S.

View all →

Latest News

Mar 15, 2026

Evaluating Medical AI: What Social Science Already Knows

Five Random Control Trials from Social Science that offer valuable lessons in evaluating AI's effects on health

Feb 19, 2026

Clinical Documentation & Medical Billing - New Benchmarks for Healthcare

Medical documentation and medical coding sit at the center of healthcare’s administrative burden. We prepare the data behind the latest benchmarks.

Jan 23, 2026

Introducing Protege Evaluation Datasets and Benchmarks for Healthcare AI

Real world datasets that match true patient journeys for unbiased healthcare AI model evaluation.

View all →