One of the critical challenges in applying machine learning (ML) techniques to predict physicochemical properties of biomolecules and materials is how to represent these systems. Some well-known representations are based on the geometric features of molecules; however, the performance of a ML model can be improved by including other relevant information in the molecular descriptors.
This seminar will discuss the use of molecular descriptors that incorporate (global and local) electronic quantum-mechanical (QM) properties computed via the third-order density functional tight-binding method supplemented with many-body dispersion interaction (DFTB3+MBD). The performance of geometric representations and their combination with QM descriptors is compared using a Kernel Ridge Regression approach to predict QM properties as well as biological responses such as lipophilicity and toxicity. We have considered small organic molecules stored in QM7-X dataset as our QM targeted set. Whereas, for biological responses prediction, we have investigated larger and more flexible drug-like molecules from the ChEMBL and LD50 datasets. The influence of conformational sampling on performance of ML models has also been studied for the latter datasets.
Alejandra Hinostroza earned her Bachelor's degree in Physics from the National University of Engineering (Peru) in 2023. During her undergraduate studies, she was selected to participate in the REPU program, where she completed an internship with Dr. Leonardo Medrano at the University of Luxembourg. She then received two additional scholarships for research internships: one at Johannes Gutenberg-Universität Mainz (Germany) under the supervision of Dr. Hartmut Wittig, and another at the University of Alberta (Canada) with Dr. Khaled Barakat’s group. These internships focused on the intersection of Machine Learning and Life Sciences, with emphases on Lattice QCD and molecular design, respectively. Currently, she works part-time at Thoth Biosimulations, specializing in data analysis, particularly in Molecular Dynamics simulations. Additionally, she is conducting research for her professional thesis, performing electronic-structure calculations of crystals using density functional theory.
One of the critical challenges in applying machine learning (ML) techniques to predict physicochemical properties of biomolecules and materials is how to represent these systems. Some well-known representations are based on the geometric features of molecules; however, the performance of a ML model can be improved by including other relevant information in the molecular descriptors.
This seminar will discuss the use of molecular descriptors that incorporate (global and local) electronic quantum-mechanical (QM) properties computed via the third-order density functional tight-binding method supplemented with many-body dispersion interaction (DFTB3+MBD). The performance of geometric representations and their combination with QM descriptors is compared using a Kernel Ridge Regression approach to predict QM properties as well as biological responses such as lipophilicity and toxicity. We have considered small organic molecules stored in QM7-X dataset as our QM targeted set. Whereas, for biological responses prediction, we have investigated larger and more flexible drug-like molecules from the ChEMBL and LD50 datasets. The influence of conformational sampling on performance of ML models has also been studied for the latter datasets.
Alejandra Hinostroza earned her Bachelor's degree in Physics from the National University of Engineering (Peru) in 2023. During her undergraduate studies, she was selected to participate in the REPU program, where she completed an internship with Dr. Leonardo Medrano at the University of Luxembourg. She then received two additional scholarships for research internships: one at Johannes Gutenberg-Universität Mainz (Germany) under the supervision of Dr. Hartmut Wittig, and another at the University of Alberta (Canada) with Dr. Khaled Barakat’s group. These internships focused on the intersection of Machine Learning and Life Sciences, with emphases on Lattice QCD and molecular design, respectively. Currently, she works part-time at Thoth Biosimulations, specializing in data analysis, particularly in Molecular Dynamics simulations. Additionally, she is conducting research for her professional thesis, performing electronic-structure calculations of crystals using density functional theory.