3rd Biomedical STAR-AI Symposium – Student AI Competition – Data Access Requirements
STAR AI — Data Access Requirements Guide
Safety • Trustworthy • Actionable • Responsible
Overview
This document outlines the specific steps, documentation, and training requirements that team members must complete to obtain official access to the datasets used in the STAR AI competition projects. Requirements vary by dataset source. Please carefully review the section corresponding to your assigned project and begin the access process as early as possible, as some approvals may take several business days or longer.
Important: Many of these datasets contain protected health information or sensitive research data. All team members are expected to handle data responsibly and in compliance with the applicable Data Use Agreements.
Project – Dataset Summary
The table below maps each STAR AI project to its selected dataset(s), access platform, and required training.
| Project | Dataset | Platform | Access Level | Training |
|---|---|---|---|---|
| 01: Multi-Agent Multimodal RAG for Radiology Decision Support | Med-MAT (2025) | HuggingFace | Open Access | None |
| 01: Multi-Agent Multimodal RAG for Radiology Decision Support | RaDialog Instruct Dataset v1.1.0 | PhysioNet | Credentialed | CITI Training |
| 02: Robust Drug-Target Affinity for Precision Drug Discovery | DrugForm-DTA (2025) | Zenodo | Open Access | None |
| 03: Safe Clinical Reasoning Agents for EHR-Based Diagnosis | EHRCon v1.0.1 | PhysioNet | Restricted | CITI Training |
| 04: ICU Multi-Outcome Alarms for Critical Care Monitoring | eICU-CRD v2.0 | PhysioNet | Credentialed | CITI Training |
| 05: Wearable Causal Digital Twin for Mental Health Interventions | DREAMT v2.1.0 | PhysioNet | Restricted | CITI Training |
| 05: Wearable Causal Digital Twin for Mental Health Interventions | SHHS | NSRR (sleepdata.org) | Restricted | DUA |
Section A: Open Access Datasets (No Training Required)
The following datasets are freely available and do not require CITI training or credentialing. Simply visit the link, agree to any terms of use, and download.
Med-MAT (Project 01)
- Platform: HuggingFace
- Description: Multimodal medical assessment dataset for evaluating medical AI models.
- Link: https://huggingface.co/datasets/FreedomIntelligence/Med-MAT
- Access: Create a free HuggingFace account, navigate to the dataset page, and download.
DrugForm-DTA (Project 02)
- Platform: Zenodo
- Description: Real-world noisy drug–target affinity benchmark that explicitly accounts for drug forms and assay conditions. Released 2025.
- Link: https://zenodo.org/records/14949570
- Access: Zenodo datasets are open access. Click “Download” on the record page. No account is strictly required, though a free Zenodo account is recommended.
Section B: PhysioNet Datasets (CITI Training Required)
Projects 01, 03, 04, and 05 rely on datasets hosted on PhysioNet (physionet.org). PhysioNet uses controlled-access tiers that require CITI training and a credentialing process. The affected datasets are:
| Project | Dataset | Access Level |
|---|---|---|
| 01: Multimodal RAG for Radiology | RaDialog Instruct Dataset v1.1.0 | Credentialed |
| 03: Safe Clinical Reasoning for EHR | EHRCon v1.0.1 | Restricted |
| 04: ICU Alarms for Critical Care | eICU-CRD v2.0 | Credentialed |
| 05: Wearable Digital Twin for Mental Health | DREAMT v2.1.0 | Restricted |
What You Need to Do
Accessing any PhysioNet credentialed or restricted dataset is a three-step process:
Step 1: Complete CITI Training
- Go to the CITI Program website: https://www.citiprogram.org
- Create an account using your institutional email address (not a personal email).
- Select “Massachusetts Institute of Technology Affiliates” as your organization affiliation.
- In the Human Subjects training category, enroll in the “Data or Specimens Only Research” course.
- For the questionnaire, answer questions 1, 2, and 3. For question 5 (Conflicts of Interest), select “Yes.”
- Complete all required modules, including Data or Specimens Only Research and Conflicts of Interest.
- After completion, go to “Records” at the top of the CITI website and download your Completion Report (not the certificate). Select “View-Print-Share” under Completion Record and click “View/Print” to get the full report as a PDF.
Step 2: Create a PhysioNet Account and Submit Credentialing
- Register for an account at https://physionet.org (or log in if you already have one).
- Navigate to your user profile and complete the “Credentialing” page with your personal details.
- On the “Training” page of your profile, upload the CITI Completion Report (PDF).
- If you are a student or postdoc, provide your supervisor’s name and contact information as a reference.
- Linking your ORCID iD to your PhysioNet profile is recommended.
Note: Approval may take several business days. Incomplete applications or missing the CITI training report will be delayed.
Step 3: Sign the Data Use Agreement (DUA)
- Once your credentialing is approved, navigate to the specific dataset page on PhysioNet.
- In the “Files” section, sign the Data Use Agreement.
- You will then be able to download the dataset files.
Credentialed vs. Restricted Access: Both tiers require CITI training and the credentialing process above. “Restricted” datasets may have additional approval requirements.
PhysioNet Dataset Links
- RaDialog Instruct (Project 01): Link
- EHRCon (Project 03): Link
- eICU-CRD (Project 04): Link
- DREAMT (Project 05): Link
Section C: NSRR / Sleep Heart Health Study (Project 05)
- Create an account at https://sleepdata.org.
- Navigate to the SHHS dataset page: https://sleepdata.org/datasets/shhs
- Click “Request Data Access” and complete the online Data Access and Use Agreement (DAUA).
- Describe your intended use of the data.
- Submit the request. Access is granted for 3 years and is free of charge.
Quick Reference Summary
| Project | Platform | Key Requirement | Est. Timeline |
|---|---|---|---|
| 01: Multimodal RAG | HuggingFace + PhysioNet | HF account (Med-MAT), CITI + Credentialing (RaDialog) | Immediate + Several business days |
| 02: Drug-Target Affinity | Zenodo | None (open download) | Immediate |
| 03: Clinical Reasoning | PhysioNet | CITI Training + Credentialing + DUA | Several business days |
| 04: ICU Alarms | PhysioNet | CITI Training + Credentialing + DUA | Several business days |
| 05: Wearable Digital Twin | PhysioNet + NSRR | CITI + Credentialing (DREAMT), DAUA (SHHS) | Several business days |
Important Reminders
- Start the access process early. PhysioNet credentialing may experience delays.
- Use your institutional or university email address for all registrations.
- For PhysioNet, upload the CITI Completion Report (not the certificate).
- Students and postdocs must provide a supervisor or faculty advisor as a reference.
- Do not share dataset access credentials or data files with anyone who has not been independently approved.
- All datasets must be used in compliance with their respective Data Use Agreements. PhysioNet data may not be shared with third-party services without ensuring full compliance.
If you encounter any issues during the access process, please reach out to the team lead for guidance.
STAR AI — Safety • Trustworthy • Actionable • Responsible


