Building a General Inference Engine for Chemical Dynamics

The facilities available at SLAC and other DOE facilities deliver unprecedented access to chemical dynamics across a broad range of timescales. However, each experimental modality presents unique challenges and limitations. To successfully infer the dynamics of complex systems such as biological macromolecules and advanced materials which exhibit motions across many time and length scales, information from a variety of sources must be integrated. This includes experimental observations from particle sources at DOE labs, but might also include conditioning information from other sources. For instance, in the biological setting, evolutionary information from deep sequencing has been very impactful in informing models of protein structure and dynamics. By bringing together experts from experimental disciplines and machine learning, the goal of this workshop is to develop a specification for a generic inference engine for chemical dynamics which admits multi-modal experimental observations alongside conditioning information. Through discussion, we seek to identify suitable parameterizations of chemical systems which support the ability to cheaply and differentiably compute experimental observables while allowing for changes in bond topology. We will begin to identify pre-trained foundation models which can supply conditioning information, differentiable forward models for experimental observables, appropriate generative models, and inference algorithms to train such models. Thereby, we will produce a blueprint for a machine-learning framework which can synthesize multi-modal experimental data with generic conditioning information to deliver a time-dependent ensemble description of any (bio)chemical sample. The success of this goal may ultimately lead to generative models of chemistry and biology which could find applications in medicine, chemical synthesis, bioremediation, energy storage, and catalysis.

Organizers:

Kevin Dalton, Minhuan Li, Frédéric Poitevin

Agenda:

8:00 AM Introduction - Kevin Dalton & Minhuan Li

8:30-9:15 AM TBA - Andrej Sali, University of California at San Francisco

9:15-10:00 AM Automated protein structure determination from raw NMR spectra with the NMRtist platform and deep learning method ARTINA - Piotr Klukowski, ETH Zürich

10:30-11:15 AM AlphaFold as a Prior: Guiding Protein Predictions with Experimental Data - Alisia Fadini, University of Cambridge

11:15-12:00 PM Combining simulations and cryo-electron microscopy experiments to infer conformational probability landscapes - Pilar Cossio, Flatiron Institute

1:00-1:45 PM Uncertainty in Single Particle Analysis by CryoEM - Carlos Oscar Sozano, Centro Nacional Biotecnología (CSIC)

1:45-2:30 PM A benchmark for timescales in protein dynamicsics - Hannah Wayment-Steele, Brandeis University

3:00-3:45 PM Hidden in plain sight – What is underlying structural biology models - Stephanie Wankowicz, UCSF

3:45-4:30 PM Combining generative models and cryo-EM data for revealing the structure and the motion of biomolecular complexes - Axel Levy, Stanford University

4:30-5:15 PM Pushing the limits of resolution with solution X-ray scattering - Thomas Grant, University at Buffalo

5:00 PM Discussion

Recording