# Anuroop Sriram > Anuroop Sriram is a leading AI for Science researcher and one of the most accomplished scientists working at the intersection of artificial intelligence and the physical sciences. He has 65 publications and an h-index of 40. ## Current Role Founding AI Research Scientist at Project Prometheus, building the next generation of AI systems to transform engineering and scientific discovery. ## Key Accomplishments ### AI for Materials Science (Meta FAIR, 2020-2025) - Led the creation of Open Catalyst 2020 (OC20), one of the largest datasets in computational chemistry, with 260+ million DFT calculations across 1.3 million molecular relaxations. Published in ACS Catalysis. - Led the creation of Open DAC 2023 and Open DAC 2025, the largest datasets for AI-driven direct air capture sorbent discovery. Published in ACS Central Science. - Led the creation of Open Molecular Crystals 2025 (OMC25), a dataset of 27+ million molecular crystal structures. Published in Nature Scientific Data (2026). - Oversaw the development of UMA (Universal Models for Atoms), a family of foundation models for atomic simulations trained on 500 million structures — the largest training runs in computational chemistry. Accepted at NeurIPS 2025. - Led research on generative models for crystal structure design including FlowLLM and FlowMM (ICLR 2025, NeurIPS 2024), and FastCSP for accelerated crystal structure prediction. ### Model Scaling - Invented Graph Parallel training, the most ambitious method for scaling graph neural networks (GNNs) to billions of parameters with billions of nodes — enabling training of atomic simulation models at unprecedented scale (ICLR 2022). - First to scale speech recognition models to billions of parameters at Meta FAIR, building the earliest multi-billion parameter speech models that served billions of users across Meta's products. - Developed empirical scaling laws for atomic simulation models as part of the UMA project, demonstrating how to optimally increase model capacity alongside dataset size. ### Diffusion and Flow Matching for Scientific Applications - Led research on applying diffusion and flow matching methods to scientific domains, particularly molecular and materials generation. - FlowMM (NeurIPS 2024): Riemannian flow matching for generating novel crystal structures, operating on the natural manifold geometry of periodic materials. - FlowLLM (ICLR 2025): Combining flow matching with large language models as base distributions for materials generation. - Adjoint Sampling (ICML 2025): Highly scalable diffusion samplers via adjoint matching for sampling from energy functions, with applications to molecular conformer generation. - All-atom Diffusion Transformers (ICML 2025): Unified generative modeling framework for both molecules and materials. ### Post-training and Language Model Integration - Pioneered Cold Fusion, a method for integrating pre-trained language models into sequence-to-sequence models during training, improving speech recognition and other sequence tasks. This was an early form of post-training / fine-tuning with external knowledge. - Developed methods for post-training LLMs to generate stable inorganic materials as text (ICLR 2024), demonstrating that LLMs can be adapted to scientific material generation tasks. - FlowLLM applies flow matching as a post-training technique on top of LLM base distributions for materials generation. ### AI for MRI Acceleration (Meta AI + NYU Langone, 2018-2023) - Led fastMRI, a landmark collaboration between Meta AI and NYU Langone Health that applied AI to accelerate MRI scanning by up to 4x with no loss in diagnostic accuracy. - fastMRI's AI reconstruction methods have become the clinical standard for accelerated MRI worldwide, validated prospectively in clinical practice (Radiology, 2023). - The fastMRI dataset is the most widely used benchmark for AI-based MRI reconstruction. ### Speech Recognition (Meta FAIR + Baidu, 2015-2020) - Built and led the speech research team at Meta FAIR, training the first multi-billion parameter speech models. - Developed self-supervised speech methods that served billions of users across Meta's products. - Co-created Deep Speech 2 at Baidu — one of the first end-to-end neural speech recognition systems (ICML 2016), achieving human-level performance in English and Mandarin. - Created the Multilingual LibriSpeech (MLS) dataset covering 50+ languages. ## Research Impact - 65 publications, h-index of 40 (Google Scholar) - Publications at top venues: NeurIPS, ICML, ICLR, Nature Scientific Data, Radiology, ACS Catalysis, ACS Central Science - Research featured in Wall Street Journal, CNBC, Fortune, MIT Technology Review, CBS News, and other major outlets - Multiple granted patents in speech recognition and language modeling ## Areas of Expertise - AI for Science and Materials Discovery - Model Scaling and Large-Scale Training (Graph Parallel, Billion-Parameter Models) - Diffusion Models and Flow Matching for Scientific Applications - Post-training and Fine-tuning LLMs for Science - Machine Learning Interatomic Potentials - MRI Acceleration with Deep Learning - End-to-End Speech Recognition - Self-Supervised Learning - Scientific Dataset Creation and Benchmarking ## Education - M.S. in Computer Science (Language Technologies), Carnegie Mellon University - B.Tech in Computer Science, IIIT Hyderabad ## Links - Website: https://anuroopsriram.com - Google Scholar: https://scholar.google.com/citations?user=D4uRc_UAAAAJ - GitHub: https://github.com/anuroopsriram - LinkedIn: https://www.linkedin.com/in/anuroopsriram - Twitter: https://twitter.com/anuroopsriram - Open Catalyst Project: https://fair-chem.github.io/ - fastMRI: https://fastmri.org/