3rd Biomedical STAR-AI Symposium – Student AI Competition – Data Access Requirements

STAR AI — Data Access Requirements Guide

Safety • Trustworthy • Actionable • Responsible

Overview

This document outlines the specific steps, documentation, and training requirements that team members must complete to obtain official access to the datasets used in the STAR AI competition projects. Requirements vary by dataset source. Please carefully review the section corresponding to your assigned project and begin the access process as early as possible, as some approvals may take several business days or longer.

Important: Many of these datasets contain protected health information or sensitive research data. All team members are expected to handle data responsibly and in compliance with the applicable Data Use Agreements.

Project – Dataset Summary

The table below maps each STAR AI project to its selected dataset(s), access platform, and required training.

Project Dataset Platform Access Level Training
01: Multi-Agent Multimodal RAG for Radiology Decision Support Med-MAT (2025) HuggingFace Open Access None
01: Multi-Agent Multimodal RAG for Radiology Decision Support RaDialog Instruct Dataset v1.1.0 PhysioNet Credentialed CITI Training
02: Robust Drug-Target Affinity for Precision Drug Discovery DrugForm-DTA (2025) Zenodo Open Access None
03: Safe Clinical Reasoning Agents for EHR-Based Diagnosis EHRCon v1.0.1 PhysioNet Restricted CITI Training
04: ICU Multi-Outcome Alarms for Critical Care Monitoring eICU-CRD v2.0 PhysioNet Credentialed CITI Training
05: Wearable Causal Digital Twin for Mental Health Interventions DREAMT v2.1.0 PhysioNet Restricted CITI Training
05: Wearable Causal Digital Twin for Mental Health Interventions SHHS NSRR (sleepdata.org) Restricted DUA

Section A: Open Access Datasets (No Training Required)

The following datasets are freely available and do not require CITI training or credentialing. Simply visit the link, agree to any terms of use, and download.

Med-MAT (Project 01)

DrugForm-DTA (Project 02)

  • Platform: Zenodo
  • Description: Real-world noisy drug–target affinity benchmark that explicitly accounts for drug forms and assay conditions. Released 2025.
  • Link: https://zenodo.org/records/14949570
  • Access: Zenodo datasets are open access. Click “Download” on the record page. No account is strictly required, though a free Zenodo account is recommended.

Section B: PhysioNet Datasets (CITI Training Required)

Projects 01, 03, 04, and 05 rely on datasets hosted on PhysioNet (physionet.org). PhysioNet uses controlled-access tiers that require CITI training and a credentialing process. The affected datasets are:

Project Dataset Access Level
01: Multimodal RAG for Radiology RaDialog Instruct Dataset v1.1.0 Credentialed
03: Safe Clinical Reasoning for EHR EHRCon v1.0.1 Restricted
04: ICU Alarms for Critical Care eICU-CRD v2.0 Credentialed
05: Wearable Digital Twin for Mental Health DREAMT v2.1.0 Restricted

What You Need to Do

Accessing any PhysioNet credentialed or restricted dataset is a three-step process:

Step 1: Complete CITI Training

  • Go to the CITI Program website: https://www.citiprogram.org
  • Create an account using your institutional email address (not a personal email).
  • Select “Massachusetts Institute of Technology Affiliates” as your organization affiliation.
  • In the Human Subjects training category, enroll in the “Data or Specimens Only Research” course.
  • For the questionnaire, answer questions 1, 2, and 3. For question 5 (Conflicts of Interest), select “Yes.”
  • Complete all required modules, including Data or Specimens Only Research and Conflicts of Interest.
  • After completion, go to “Records” at the top of the CITI website and download your Completion Report (not the certificate). Select “View-Print-Share” under Completion Record and click “View/Print” to get the full report as a PDF.

Step 2: Create a PhysioNet Account and Submit Credentialing

  • Register for an account at https://physionet.org (or log in if you already have one).
  • Navigate to your user profile and complete the “Credentialing” page with your personal details.
  • On the “Training” page of your profile, upload the CITI Completion Report (PDF).
  • If you are a student or postdoc, provide your supervisor’s name and contact information as a reference.
  • Linking your ORCID iD to your PhysioNet profile is recommended.

Note: Approval may take several business days. Incomplete applications or missing the CITI training report will be delayed.

Step 3: Sign the Data Use Agreement (DUA)

  • Once your credentialing is approved, navigate to the specific dataset page on PhysioNet.
  • In the “Files” section, sign the Data Use Agreement.
  • You will then be able to download the dataset files.

Credentialed vs. Restricted Access: Both tiers require CITI training and the credentialing process above. “Restricted” datasets may have additional approval requirements.

PhysioNet Dataset Links

  • RaDialog Instruct (Project 01): Link
  • EHRCon (Project 03): Link
  • eICU-CRD (Project 04): Link
  • DREAMT (Project 05): Link

Section C: NSRR / Sleep Heart Health Study (Project 05)

  • Create an account at https://sleepdata.org.
  • Navigate to the SHHS dataset page: https://sleepdata.org/datasets/shhs
  • Click “Request Data Access” and complete the online Data Access and Use Agreement (DAUA).
  • Describe your intended use of the data.
  • Submit the request. Access is granted for 3 years and is free of charge.

Quick Reference Summary

Project Platform Key Requirement Est. Timeline
01: Multimodal RAG HuggingFace + PhysioNet HF account (Med-MAT), CITI + Credentialing (RaDialog) Immediate + Several business days
02: Drug-Target Affinity Zenodo None (open download) Immediate
03: Clinical Reasoning PhysioNet CITI Training + Credentialing + DUA Several business days
04: ICU Alarms PhysioNet CITI Training + Credentialing + DUA Several business days
05: Wearable Digital Twin PhysioNet + NSRR CITI + Credentialing (DREAMT), DAUA (SHHS) Several business days

Important Reminders

  • Start the access process early. PhysioNet credentialing may experience delays.
  • Use your institutional or university email address for all registrations.
  • For PhysioNet, upload the CITI Completion Report (not the certificate).
  • Students and postdocs must provide a supervisor or faculty advisor as a reference.
  • Do not share dataset access credentials or data files with anyone who has not been independently approved.
  • All datasets must be used in compliance with their respective Data Use Agreements. PhysioNet data may not be shared with third-party services without ensuring full compliance.

If you encounter any issues during the access process, please reach out to the team lead for guidance.

STAR AI — Safety • Trustworthy • Actionable • Responsible