Xiaoli Fern and Cory Simon

Collaboration develops AI to discover new molecules and materials

Introduction

When chemical engineer Cory Simon and computer scientist Xiaoli Fern first met at a social event at 精东影视 State University, little did they know that a friendly game of cornhole would lead to a long, fruitful collaboration. Their interdisciplinary partnership, which merges artificial intelligence and chemistry, aims to speed up and reduce the cost of discovering materials and molecules.

The researchers have combined their expertise to create machine learning models capable of predicting the properties of new materials and molecules, with implications for various industry applications, including separating, storing, and sensing gases.

"Machine learning can play a crucial role in accelerating material and molecular discovery by predicting their properties and making the search more efficient," Fern said.

Harnessing graph neural networks for molecular property prediction

Their work centers around graph neural networks, which are particularly well-suited for prediction tasks on molecules.

鈥淕NNs are a class of deep learning models that can operate directly on graph-structured data,鈥 Fern said. 鈥淪ince molecules can be naturally represented as graphs, with atoms as nodes and bonds as edges, GNNs are a natural fit for molecular property prediction tasks. In particular, we employ message-passing neural networks to encode local and global features of molecular graphs effectively.鈥

Chrystal structure of a metal-organic framework, IRMOF-1
The crystal structure of a metal-organic framework, IRMOF-1. This material exhibits nano-sized pores that adsorb gas molecules. Loosely, the internal surface of IRMOF-1 provides 鈥減arking spaces鈥 for gas molecules. Since different gas species are attracted to the pore walls, these materials can be exploited for separating gases, too. Hundreds of thousands of different materials can be made in the lab, and AI can help sort through the possibilities and predict which structure will be optimal for a given gas storage, separation, or sensing task.

Fern and Simon鈥檚 model ingests molecular structures and uses GNNs to predict properties such as solubility, gas adsorption capacity, or even the perceived smell of a molecule. To achieve this, the researchers employ a combination of supervised and unsupervised learning techniques, including pre-training.

鈥淭he ability to computationally predict the properties of molecules and materials accurately is essential for the efficient discovery of new molecules and materials, as it helps researchers focus on the most promising candidates in the lab,鈥 Simon said. 鈥淢ore, if we quantify the uncertainty in the model鈥檚 predictions, we can guide decision-making in the lab for optimization and exploration of molecules and materials.鈥

Addressing scarcity of labeled data

One key challenge in applying machine learning to molecular discovery is the scarcity of labeled data. To address this, Fern and Simon have employed transfer learning, a technique that leverages knowledge learned from one task to improve performance on another, related task.

鈥淲e utilize transfer learning to make the most of the limited labeled data we have,鈥 Fern said. 鈥淏y pre-training our GNNs on large, unlabeled data sets, we can extract meaningful representations that can be fine-tuned on smaller, labeled data sets to achieve better performance.鈥

Machine learning is also often employed in domains where rules are not explicitly known, enabling the creation of predictive models. Various learning algorithms exist, with some being more interpretable than others. Decision tree algorithms, for example, are considered interpretable models, as they make decisions based on specific feature questions and provide a clear basis for those decisions. However, modern methods, particularly deep learning, are generally less interpretable, making it difficult to understand the contributing structures to predictions.

There is much ongoing research into explaining the predictions of AI systems. A significant challenge lies in explaining graph neural networks that operate on molecules, as the explanation methods often measure the importance of specific edges or nodes by removing them from the graph to observe their impact. In the context of chemical structures, removing these components result in invalid structures, complicating interpretation. Overall, explaining the basis for machine learning predictions in complex domains remains a difficult problem that group is addressing.

The future of interdisciplinary collaboration

Looking ahead, Fern and Simon are optimistic about the potential of their interdisciplinary collaboration to advance the field.

鈥淲e believe that the marriage of machine learning and chemistry will continue to revolutionize how we discover and design new materials and molecules,鈥 Fern said.

Indeed, Fern and Simon's collaboration is a compelling example of the potential for interdisciplinary research to drive innovation. As their work continues to evolve, they are exploring new ways to enhance their models and expand their applications; for example, by incorporating multiple sources of information (data) to strengthen a model鈥檚 predictions.

Active learning strategies

Another area of exploration is the development of active learning strategies, which involve iteratively refining the model by selecting the most informative examples for training.

鈥淎ctive learning allows our model to iteratively select the most valuable data points to learn from, thereby improving its performance and reducing the need for extensive labeled data,鈥 Fern said. 鈥淭his can be particularly useful in molecular discovery, where acquiring labeled data can be time-consuming and costly.鈥

Expanding applications and collaborations

The researchers are also examining the potential for their model to be applied to other domains, such as polymers.

鈥淥ur fundamental work has the potential to impact a wide range of chemical industries, from the development of new nano-porous materials for gas separations to the optimization of polymers for medical imaging,鈥 Simon said. 鈥淭he application of machine learning to the chemical sciences can rapidly accelerate the design and discovery of molecules and materials 鈥 especially when combined with automation in the lab.鈥

Simon and Fern also recognize the potential benefits of collaborating with industry to further their research and promote workforce development. They invite external partners to join their efforts. Their research teams boast top-tier doctoral candidates from 精东影视 State, offering valuable access to emerging talent. By fostering these collaborative relationships, they aim to drive innovation in material and molecular discovery while supporting the professional growth of the next generation of experts.

Fern and Simon鈥檚 partnership demonstrates transformative potential in combining machine learning and chemical engineering. Their work underscores the importance of interdisciplinary research and sheds light on the future of material and molecular discovery.


If you鈥檙e interested in connecting with the AI and Robotics Program for hiring and collaborative projects, please contact AI-OSU@oregonstate.edu.

Subscribe to AI @ 精东影视 State

Return to AI @ 精东影视 State

Nov. 27, 2023

Related People

Xiaoli Fern

Xiaoli Fern

Associate Professor

Cory Simon.

Cory Simon

Associate Professor

Related Stories