Building a General Inference Engine for Molecular Structures and Ensembles

The state-of-the-art experimental facilities at SLAC and other DOE laboratories generate vast and complex measurements of biological molecules from diverse particle scattering modalities. Specialized data processing methods have been developed to address unique sources of errors and challenges in each experimental modality. Recently, machine learning (ML) has been increasingly applied in structural biology, but these efforts often remain siloed within specific experimental methods. This has limited the synthesis of complementary information, despite proven value in mapping the conformational landscape of molecules across distinct time and length scales. By bringing together experts from experimental disciplines and machine learning, the goal of this workshop is to discuss frameworks that effectively leverage multi-modal experimental measurements and other prior knowledge to characterize molecular structures and dynamics. We will focus on shared challenges in applying ML on various molecular imaging tasks, such as formulating appropriate noise models and obtaining interpretable features. Through discussions, we seek to establish benchmarks and best practices for integrating ML methods into data processing pipelines at SLAC and other DOE facilities. A key outcome will be formulating differentiable forward models of each experimental modality from a shared parameterization. The success of this goal may lead to generative models of chemistry and biology, which could find applications in drug discovery, chemical synthesis, catalysis, and material science.

Organizers:

Luis Aldama, Minhuan Li, Doris Mai, Kevin Dalton, Frédéric Poitevin

Agenda:

8:00 AM Introduction - Doris Mai, Minhuan Li, Luis Aldama

8:15-8:45 AM Bayesian inference of structural ensembles from single-molecule X-rays scattering images, Steffen Schultze, Max Planck Institute for Multidisciplinary Sciences

8:45-9:15 AM Transferable Temperature-dependent Boltzmann Emulator, Soojung Yang, Massachusetts Institute of Technology

9:15-9:45 AM Reconstructing conformational states and free-energy landscapes in cryo-EM, Marc Aurèle Gilles, Princeton University, Department of Mathematics

10:15-10:45 AM Predicting macromolecule structure and dynamics with deep learning and solution scattering, Michal Hammel, Lawrence Berkeley National Laboratory

10:45-11:15 AM CryoFM: A Flow-based Foundation Model for Cryo-EM Densities, Quanquan Gu, ByteDance Inc.

11:15-11:45 AM Conformational Ensembles Predictors v. Experimental Data, Stephanie Wankowicz, Vanderbilt University

11:45-12:15 AM Experiment-guided AlphaFold for accurate protein ensemble determination, Sanketh Vedula, Princeton University and Broad Institute

Recording