Predicting Thermoelectric Transport Properties from Composition

Apr 23, 2023   •   Luis M. Antunes

Civilization continues to be powered largely by fossil fuels. Though renewable sources of energy, such as wind and solar, are increasingly becoming prevalent, the shift away from fossil fuels—which introduce an extraneous source of carbon into the environment, leading to unwanted effects, such as global warming and ocean acidification—has proven to be too slow to address environmental concerns. One means of reducing the usage of fossil fuels involves the recycling of waste heat, generated during combustion, back into a usable form, such as electricity. As a civilization, about 70% of our energy is wasted as unusable heat. By making more efficient use of a fuel, less of it is used; and reducing the usage of a fuel will reduce the release of pollutants into the environment. Typically, converting waste heat into electricity involves the use of machines such as steam engines, but these devices are often made of multiple moving parts, which makes them difficult to scale to widespread usage, and are too cumbersome to be flexible enough for a variety of scenarios. An alternative is a solid-state device known as a thermoelectric generator, built from thermoelectric materials.

Figure 1: Scheme of a thermoelectric couple comprising an n-type and a p-type semiconductor.

A thermoelectric material exhibits the thermoelectric effect, which refers to the diffusion of charge carriers from the hot side to the cold side of the material upon application of a temperature gradient. A thermoelectric generator is a sold-state device built from two thermoelectric materials, one with n-type conductivity, and the other with p-type conductivity (Figure 1). The materials are typically assembled with electrical and thermal connections between a heat source and a heat sink. The efficiency of a thermoelectric generator depends strongly on the temperature difference between the heat source and sink, as well as on the physical characteristics of the materials used. The physical characteristics of a thermoelectric material are usually summarized in the figure of merit: $$ zT = \frac{S^2 \sigma T}{\kappa}, \label{eqn:zT} $$ where \(S\) is the Seebeck coefficient, \(\sigma\) is the electrical conductivity, \(T\) is the absolute temperature, and \(\kappa\) is the thermal conductivity, which contains two main contributions: the lattice thermal conductivity \(\kappa_{\mathrm{latt}}\) due to crystal vibrations, and the electronic thermal conductivity \(\kappa_{\mathrm{elec}}\) due to heat-carrying diffusion of electrons in the solid. The Seebeck coefficient relates the change in the material's electric potential to the change in its temperature. The term \(S^2 \sigma\) is commonly referred to as the power factor. The higher the dimensionless figure of merit \(zT\), the more efficient the thermoelectric material. Consequently, a good thermoelectric material must exhibit a large (absolute) Seebeck coefficient, good electrical conductivity, but low thermal conductivity.

These requirements of maintaining good electrical conductivity with low thermal conductivity are somewhat at odds, since the diffusion of electrons is required for electrical conductivity, but also results in the transport of heat. These conflicting properties make most materials unsuitable for thermoelectric applications. Indeed, while the phenomenon of thermoelectricity has been known since the 1800s, relatively few materials have been discovered that possess the properties required for practical applications. Moreover, of the materials that are suitable for practical devices, such as Bi2Te3 and PbTe, most (if not all) are too expensive and/or too toxic for widespread use, since they are based on rare or toxic elements, or are difficult to fabricate. To have a meaningful environmental impact, thermoelectric generators must be adopted widely, such as by their direct incorporation into the exhaust systems of vehicles. For this, cheaper and less toxic thermoelectric materials are required.

Searching for new materials

Historically, new materials were found serendipitously, or via the application of chemical intuition. However, the democritization and widespread adoption of high-performance computing, and the development of physical theory that can exploit powerful computational platforms, has enabled a new means of materials discovery, where the search occurs in silico before it is performed in the laboratory. The computational search for new materials usually involves computing the physical properties of very many compounds, and selecting those which exhibit the best characteristics for further investigation. This is known as the High-Throughput Screen (HTS).

To search for new thermoelectric materials, we need to compute the thermoelectric transport properties of a candidate material. However, this depends on knowing the 3-dimensional chemical structure of the material. While the number of possible inorganic solids is quite vast, exceeding 1012 for compounds consisting of 4 elements, only about 105 structures are known in materials databases (at best). It's possible that a remarkable thermoelectric material exists amongst known structures. However, to increase the chances of success, we need to expand the scope beyond the space of known structures. We can indentify plausible compositions much more easily than we can produce plausible 3D structures, using software tools like SMACT. Therefore, if we had the means to predict physical properties from composition alone, without requiring the material's structure, our chances of identifying a promising material would be greatly enhanced, since our search space would be much larger.

To be sure, a material's properties depend quite strongly on its structure, and certain properties are simply not easy (or even possible) to predict from the material's composition alone. However, a surprising amount of information is contained in a composition (see this paper for a fascinating investigation). Indeed, a number of effective, general purpose composition-based property predictors have been developed in recent years, such as ElemNet, Roost, and CrabNet.

Here, we describe the development of a deep neural network for the prediction of thermoelectric transport properties from a material's composition alone, which we use to screen both known and hypothetical spaces of compositions for new thermoelectric materials.


The Seebeck (S), electrical conductivity (σ), and power factor (PF) are each a function of the temperature, doping level, and doping type (see this article for a description of doping). Thus, a predictive model of thermoelectric transport properties should ideally take these variables into account. Moreover, within the context of a material, all of these properties share some degree of interrelatedness, so it would be ideal if the predictor could consider all three properties at once, when making a prediction for a given composition.

To address these requirements, we developed CraTENet (Compositionally-restricted attention-based ThermoElectrically-oriented Network). The CraTENet model is based on the CrabNet architecture, and is therefore an attention-based deep neural network. It accepts a composition as input (and optionally a band gap), and produces predictions for the S, σ, and PF, each at 13 different temperatures, 5 different doping levels, for both n and p doping types (Figure 2).

Figure 2: The multi-head attention-based architecture, CraTENet. Each of the three output heads are multi-valued, containing the prediction of the Seebeck (S), electrical conductivity (σ), and power factor (PF), at different temperatures, doping levels, and doping types.

To train the model, we made use of a publicly available dataset of thermoelectric transport properties for ~48,000 compounds, developed by Francesco Ricci et al. in 2018. The thermoelectric transport properties in this database were computed using ab initio methods based on Density Functional Theory (DFT) and the Boltzmann Transport Equation (BTE). These methods, while producing reliable results, are computationally expensive—hence the need for cheaper and faster predictive models for more exhaustive searches of chemical space.

Figure 3: True values vs. predicted values of the test set of a 90-10 holdout experiment using the CraTENet+gap model, for the Seebeck, across all temperatures, doping levels, and doping types. The plot contains 450,190 points, as there are 3,463 compositions in the test set, each with 130 (13 temperatures × 5 doping levels × 2 doping types) associated values. The inset plot depicts the distribution of absolute errors.

After training the model, we measured its performance on a held-out test set (Figure 3). To achieve the best performance, the model requires a band gap. The error in the predictions is roughly halved when a good quality band gap is provided to the model. This isn't very surprising, since the bad gap provides some electronic structure information, and the electronic transport properties of a material depend intimately on its electronic band structure. Nevertheless, this finding is both remarkable and quite useful: with just a material's composition, and a decent guess of its band gap, we can make good predictions of its electronic transport properties.

Figure 4: Seebeck coefficients at 700 K predicted with CraTENet+gap vs. those computed using the ab initio approach, for 23 Materials Projects compounds not found in the Ricci database, with p-type doping. Each point represents a particular compound at a particular doping level (e.g. SbTeIr at 1020cm−3).

To further validate the predictive capabilities of the CraTENet model, we selected 23 compounds from the Materials Project database that were not in the Ricci et al. database, and performed the same ab initio calculations. We compared the CraTENet model's predictions to the ab initio results, and found very good agreement (Figure 4).

Novel thermoelectrics

With the CraTENet model in hand, we then turned our attention to the original goal of scanning large regions of unexplored composition space for novel thermoelectrics. We scanned a known space, comprised of over 54,000 compounds from the Materials Project, and a hypothetical space of ~270,000 ternary selenides generated using SMACT. We used the CraTENet model to perform inference on these compositions. For the space of known compounds, we were able to use the band gaps available in the Materials Project, computed using ab initio methods. For the SMACT-generated selenides, we trained a band gap predictor on all the compounds in the Materials Project, and used the predicted band gaps produced with this predictor.

Figure 5: Predictions of the Seebeck and log σ for GaCuTeSe using the CraTENet models and the ab initio procedure, for p-type doping, at a level of 1019cm−3. The band gap value used, 0.387 eV, was obtained from the Materials Project. The shaded regions represent the ± standard deviation (i.e. the square root of the predicted variance).

We then ranked all the compounds in each space, according to their predicted transport properties, and selected those which both had the most favourable properties, and which had not yet been investigated as thermoelectric materials. From the space of known compounds, we identified the following three materials which may prove to exhibit desirable thermoelectric properties:

Since the structures for these compounds are known, we verified, through ab initio calculations, that their electronic transport properties closely match the predictions (Figure 5). We also identified two materials from the space of hypothetical ternary selenides: LiBiSe2 and NaTlSe2.

It's important to note that it isn't trivial identifying novel materials with desirable physical properties through computational screens, such as the one described above. In the case of thermoelectrics, there is yet much work to be done. One of the biggest challenges when developing a thermoelectric is the issue of dopability: not all compounds can be doped to the level we desire. Our study assumes that each compound can achieve the desired doping. Moreover, not all compounds are stable in the phases we expect at the temperatures we'd like. Finally, this study hasn't addressed the lattice thermal conductivity, which is a crucial aspect of the thermoelectric efficiency of a material. However, we believe that all of these issues can be aided by building various, more specialized predictive models. We envision a HTS pipeline that begins from numerous compositions, and excludes candidates at each stage, based on various criteria, and results in a small pool of candidates that can be subjected, ultimately, to investigation in the laboratory. We also envision that the method we've described in this study can continue to be used as bigger and more accurate databases are produced.

We've published the details of this study here. We've also made the source code freely available, under the MIT license. We hope that this research will be of use to the Computational Materials Science community, and aid in the quest to discover new materials for energy applications that address environmental issues, and reduce our dependence on fossil fuels.