Combining molecular modelling and machine learning for accelerated reaction screening

Thijs Stuyver
Ecole Nationale Supérieure de Chimie de Paris (ENSCP), Université PSL, Paris, France

Mercredi 27 Septembre 2023, 11h00



Machine learning (ML) has had a significant impact on various subfields of science in recent years. Part of the reason for the success of ML is that it enables the generation of predictive models with only limited domain knowledge: usually, only minor modifications to a generic ML algorithm needs to be made to generate effective models for a specific application. That is of course under the condition that sufficient data is available, and then we usually mean hundreds of thousands or even millions of datapoints. In chemistry, there are several predictive tasks for which we have this abundancy of data, but for most specialized applications - particularly those related to chemical reactivity - we do not have this luxury.
In principle, computational chemistry offers a way out when limited experimental data is available, since it enables data generation in a cheap and easy-to-automate manner. In the first part of my talk, I will explore this approach in a bit more detail and focus specifically on an ML accelerated computational workflow to screen for promising bioorthogonal click reactions that I recently developed.
While quantum chemical simulations of reactivity tend to be relatively cheap compared to experimental characterizations, the cost of generating sufficient training data for a machine learning model still becomes prohibitive, fast. As such, in the second part of my talk I will discuss strategies to improve the data efficiency of ML-based computational workflows for reactivity prediction. Specifically, I will focus on models based on an intermediate valence bond inspired representations, and demonstrate that these outperform conventional machine learning models by a wide margin for hydrogen atom transfer reactions in the low data regime.



_________________________________

References :
N. Casetti, J. E. Alfonso-Ramos, C. W. Coley, and T. Stuyver, Chem. Eur. J. 2023, e202301957.
T. Stuyver, K. Jorner, and C. W. Coley, Sci. Data 2023, 10, 66.
T. Stuyver, and C. W. Coley, Chem. Eur. J. 2023, e202300387.