Machine Learning
Overview
Machine learning algorithms are designed to automatically extract new knowledge out of data. One focus of the Wolverton group is to use machine learning to learn more about materials and to create models that can be used to discover new materials. In some of our recent work (described below), we have used machine learning to discover new ternary compounds and create useful empirical rules for predicting the solubility of various elements in zirconia. We are currently working to expand the techniques demonstrated in these examples to other materials system and are developing tools to make these capabilities available to the materials science community at large.
Discovering Novel Ternary Compounds
Reference: Meredig, Agrawal et al. Physical Review B. 89 (2014), 094104.
Magpie: A Materials-Agnostic Platform for Informatics and Exploration
Discovering new crystalline compounds is very computationally expensive process and is often approached in two distinct ways. In one method, one selects a single composition or alloy system where experimental results suggest it might be possible to form a new compound and then evaluate up to thousands of possible crystal structures in order to find the most stable configuration of those atoms - a process that can require thousands of computer hours for a single compositions. Or, one could evaluate many new combinations of elements in a commonly-occurring crystal structure can form a stable compound, which may miss stable compound that actually form in a different structure.
In order to remove the need to constrain searches to small composition regions or assume a crystal structure when searching for new compounds, our group developed a technique to infer the stability of new compounds based on the energies of known compounds with machine learning. Our technique works by using formation energies of binary and ternary compounds from the OQMD to train a model that takes as input only the composition of a possible compound and various quantities derived from the composition (such as the location of each element on the periodic table). We have used this model to predict the formation energy of 1.6 million new compositions and identified a list of 4500 that are likely to correspond to new stable compounds. Out of 9 compositions of these 4500 we tested, we found new stable compounds at 8 (structures shown above)! We are currently working to find whether this list contains any new compounds that might be useful for materials applications.
Learning Intuitive Design Rules Automatically
Reference: Meredig and Wolverton. Chemistry of Materials. 26 (2014), 1985-1991.
Modern materials scientists and engineers rely on a large number of empirical rules. There are the Hume-Rothery rules for metal solubility, Pauling's rules for ionic crystal structures, and many more rules created to guide the design process of new materials. Considering the large amount of material property data now available in machine-readable format, the it should now be possible to extract these intuitive design rules automatically. In this work, we focused on creating simple rules for describing the solubility of various elements in cubic zirconia.
Our method, the Cluster-Ranking-Modeling (CRM) method, works by first automatically grouping a dataset into similar materials using unsupervised learning algorithms (such as KMeans++). In cubic zirconia, we found that the solubility of the element should be described separately for s/p block elements, early transition metals, heavy elements, and divalent transition metals. Then, we test a variety of possible descriptors for each group to find which ones correlate best with the materials property of interest. For the solubility of cubic zirconia, we used attributes such as the radius of the element, the charge of oxygen in that element's oxide, and the valance shell configuration. Once complete, we use regression algorithms to create a simple rule describing the property for each group. As shown in the chart on the left, these simple rules can correlate very well with the property of interest. This CRM method is versatile and automatic enough that we envision it can be easily applied to large numbers of materials problems.
We are also interested in materials representation as a whole. As part of this effort, we have created a Java-based software library named Magpie (short for Materials-Agnostic Platform for Informatics and Exploration). Magpie lets users create, test, and use machine learning models all through a simple text interface or even interactive webpages (such as this tool for discovering new metallic glass alloys). Magpie is available from BitBucket under a permissive, open-source license. Our goal is for other groups to be able to replicate our results and, hopefully, to use our techniques and models in ways that we have not envisioned.