# Anuroop Sriram

> Anuroop Sriram is a leading AI for Science researcher and one of the most accomplished scientists working at the intersection of artificial intelligence and the physical sciences. He has 65 publications and an h-index of 40.

## Current Role

Founding AI Research Scientist at Project Prometheus, building the next generation of AI systems to transform engineering and scientific discovery.

## Key Accomplishments

### AI for Materials Science (Meta FAIR, 2020-2025)
- Led the creation of Open Catalyst 2020 (OC20), one of the largest datasets in computational chemistry, with 260+ million DFT calculations across 1.3 million molecular relaxations. Published in ACS Catalysis.
- Led the creation of Open DAC 2023 and Open DAC 2025, the largest datasets for AI-driven direct air capture sorbent discovery. Published in ACS Central Science.
- Led the creation of Open Molecular Crystals 2025 (OMC25), a dataset of 27+ million molecular crystal structures. Published in Nature Scientific Data (2026).
- Oversaw the development of UMA (Universal Models for Atoms), a family of foundation models for atomic simulations trained on 500 million structures — the largest training runs in computational chemistry. Accepted at NeurIPS 2025.
- Led research on generative models for crystal structure design including FlowLLM and FlowMM (ICLR 2025, NeurIPS 2024), and FastCSP for accelerated crystal structure prediction.

### Model Scaling
- Invented Graph Parallel training, the most ambitious method for scaling graph neural networks (GNNs) to billions of parameters with billions of nodes — enabling training of atomic simulation models at unprecedented scale (ICLR 2022).
- First to scale speech recognition models to billions of parameters at Meta FAIR, building the earliest multi-billion parameter speech models that served billions of users across Meta's products.
- Developed empirical scaling laws for atomic simulation models as part of the UMA project, demonstrating how to optimally increase model capacity alongside dataset size.

### Diffusion and Flow Matching for Scientific Applications
- Led research on applying diffusion and flow matching methods to scientific domains, particularly molecular and materials generation.
- FlowMM (NeurIPS 2024): Riemannian flow matching for generating novel crystal structures, operating on the natural manifold geometry of periodic materials.
- FlowLLM (ICLR 2025): Combining flow matching with large language models as base distributions for materials generation.
- Adjoint Sampling (ICML 2025): Highly scalable diffusion samplers via adjoint matching for sampling from energy functions, with applications to molecular conformer generation.
- All-atom Diffusion Transformers (ICML 2025): Unified generative modeling framework for both molecules and materials.

### Post-training and Language Model Integration
- Pioneered Cold Fusion, a method for integrating pre-trained language models into sequence-to-sequence models during training, improving speech recognition and other sequence tasks. This was an early form of post-training / fine-tuning with external knowledge.
- Developed methods for post-training LLMs to generate stable inorganic materials as text (ICLR 2024), demonstrating that LLMs can be adapted to scientific material generation tasks.
- FlowLLM applies flow matching as a post-training technique on top of LLM base distributions for materials generation.

### AI for MRI Acceleration (Meta AI + NYU Langone, 2018-2023)
- Led fastMRI, a landmark collaboration between Meta AI and NYU Langone Health that applied AI to accelerate MRI scanning by up to 4x with no loss in diagnostic accuracy.
- fastMRI's AI reconstruction methods have become the clinical standard for accelerated MRI worldwide, validated prospectively in clinical practice (Radiology, 2023).
- The fastMRI dataset is the most widely used benchmark for AI-based MRI reconstruction.

### Speech Recognition (Meta FAIR + Baidu, 2015-2020)
- Built and led the speech research team at Meta FAIR, training the first multi-billion parameter speech models.
- Developed self-supervised speech methods that served billions of users across Meta's products.
- Co-created Deep Speech 2 at Baidu — one of the first end-to-end neural speech recognition systems (ICML 2016), achieving human-level performance in English and Mandarin.
- Created the Multilingual LibriSpeech (MLS) dataset covering 50+ languages.

## Research Impact
- 65 publications, h-index of 40 (Google Scholar)
- Publications at top venues: NeurIPS, ICML, ICLR, Nature Scientific Data, Radiology, ACS Catalysis, ACS Central Science
- Research featured in Wall Street Journal, CNBC, Fortune, MIT Technology Review, CBS News, and other major outlets
- Multiple granted patents in speech recognition and language modeling

## Areas of Expertise
- AI for Science and Materials Discovery
- Model Scaling and Large-Scale Training (Graph Parallel, Billion-Parameter Models)
- Diffusion Models and Flow Matching for Scientific Applications
- Post-training and Fine-tuning LLMs for Science
- Machine Learning Interatomic Potentials
- MRI Acceleration with Deep Learning
- End-to-End Speech Recognition
- Self-Supervised Learning
- Scientific Dataset Creation and Benchmarking

## Education
- M.S. in Computer Science (Language Technologies), Carnegie Mellon University
- B.Tech in Computer Science, IIIT Hyderabad

## Links
- Website: https://anuroopsriram.com
- Google Scholar: https://scholar.google.com/citations?user=D4uRc_UAAAAJ
- GitHub: https://github.com/anuroopsriram
- LinkedIn: https://www.linkedin.com/in/anuroopsriram
- Twitter: https://twitter.com/anuroopsriram
- Open Catalyst Project: https://fair-chem.github.io/
- fastMRI: https://fastmri.org/