Name: Open Molecular Crystals 2025 (OMC25)
Creator: Anuroop Sriram
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Materials

Materials

Open Molecular Crystals 2025 (OMC25)

Over 27 million molecular crystal structures from DFT relaxations of 230,000+ randomly generated structures across 50,000 organic molecules. Released with machine-learned interatomic potentials.

PROJECT DATA CODE

Open DAC 2023 & 2025

Datasets for sorbent discovery in Direct Air Capture. ODAC23 provides more than 38 million DFT calculations across more than 8,800 MOFs; ODAC25 expands to nearly 70 million DFT single-point calculations across 15,000 MOFs with four adsorbate species. Released with machine-learned interatomic potentials.

PROJECT DATA CODE

Open Catalyst 2020 & 2022 (OC20, OC22)

Datasets for catalyst discovery to enable renewable energy storage. OC20 and OC22 together contain approximately 1.34 million molecular relaxations and 274.8 million DFT single-point calculations. Released with baseline models and code.

PROJECT DATA CODE

MRI Acceleration

fastMRI Datasets

Large-scale dataset of raw MRI measurements for AI-accelerated reconstruction. Contains 1,500 fully sampled knee MRIs, 7,000 fully sampled brain MRIs, and DICOM images from 10,000 clinical knee exams.

PROJECT DATA CODE

Speech Recognition

Multilingual LibriSpeech (MLS)

50K hours of labeled speech across 8 languages, derived from LibriVox audiobooks. Includes language models and baseline ASR models for all languages.

DATA CODE