Skip to content
DataLab at Protege

Research

Research ideas and publications that explore
AI's data frontier.

Data collection and AI processing — photography, microscopy, audio recording, motion capture, and computer vision analysis
What Should Agentic AI Actually Be Trained to Do?

What Should Agentic AI Actually Be Trained to Do?

Knowledge, Tasks, and What Data Bottlenecks Exist

Engy Ziedan, Ph.D.

Research BriefMar 11, 2026
Signal-Grounded Quality Control for Large-Scale Speech Corpora

Signal-Grounded Quality Control for Large-Scale Speech Corpora

Principles for quality control in large-scale speech datasets

Rey Pocius, M.S.

Research BriefMar 4, 2026
Preserving Clinical Reality in International Healthcare Data

Preserving Clinical Reality in International Healthcare Data

Capturing the heterogeneity of international healthcare without introducing systematic selection bias

Allison Fox, M.S.

Research BriefFeb 27, 2026