Download PDF
Research Article  |  Open Access  |  12 Mar 2025

Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation

Views: 82 |  Downloads: 3 |  Cited:  0
J. Mater. Inf. 2025, 5, 22.
10.20517/jmi.2024.89 |  © The Author(s) 2025.
Author Information
Article Notes
Cite This Article

Abstract

Magnesium (Mg) alloys have attracted considerable attention as next-generation lightweight thermal conducting materials. However, their thermal conductivity decreases significantly with increasing alloying content. Current methods for predicting thermal conductivity of Mg alloys primarily rely on computationally intensive first-principles calculations or semi-empirical models with limited accuracy. This study presents a novel machine learning approach coupled with multiscale computation for predicting thermal conductivity in multi-component Mg alloys. A comprehensive database of 1,139 thermal conductivity measurements from as-cast Mg alloys was systematically compiled. A multiscale feature set incorporating elemental characteristics, thermodynamic properties, and electronic structure parameters was constructed. Key features, including atomic radius differences, enthalpy, cohesive energy, and the ratio of electronic thermal conductivity to relaxation time, were identified through sequential forward floating selection (SFFS). The XGBoost algorithm demonstrated superior performance, achieving a mean absolute percentage error (MAPE) of 2.16% for low-component ternary and simpler Mg alloy systems. Through L1 and L2 regularization optimization, the model’s extrapolation capability for quaternary and higher-order novel systems was significantly enhanced, reducing the prediction error to 13.60%. This research provides new insights and theoretical guidance for accelerating the development of high thermal conductivity Mg alloys.

Keywords

Magnesium alloys, thermal conductivity, machine learning, multiscale computation

INTRODUCTION

With the increasing demands for high-performance, highly integrated, and miniaturized electronic devices, thermal management has emerged as a critical bottleneck limiting their development[1]. While traditional copper-based thermal conductive materials exhibit excellent thermal properties, their high density (8.9 g/cm3) restricts their application in lightweight applications. Magnesium (Mg) alloys, as prominent green engineering materials of the 21st century, offer numerous advantages including abundant reserves, low density, high thermal conductivity, high electrical conductivity, and superior electromagnetic shielding effectiveness[2-4]. Pure Mg, in particular, with its thermal conductivity of 155.3 W/(m·K), demonstrates the highest thermal conductivity per unit mass, making it an ideal candidate for next-generation lightweight thermal management materials[1,2].

However, improving the mechanical properties of Mg alloys typically requires the addition of alloying elements to the α-Mg matrix. While this alloying process enhances mechanical properties, it often significantly degrades thermal conductivity[2,5-7]. For instance, commercial Mg alloys such as AM60, AZ91, AZ91D, and AZ80 exhibit room temperature thermal conductivity of only approximately 40% that of pure Mg, failing to meet the requirements for high heat dissipation applications[8-11]. To maximize thermal conductivity while maintaining mechanical properties, it is crucial to establish composition-microstructure-thermal property relationships in these alloys.

Theoretical approaches for predicting Mg alloy thermal conductivity primarily include microscale first-principles methods and macro/mesoscale semi-empirical theoretical models[12]. First-principles methods, based on density functional theory (DFT), calculate intrinsic thermal conductivity by solving heat transport equations while considering alloy microstructure[13,14]. However, these methods are computationally intensive and primarily applicable to compound phases or specific local structures, limiting their utility for complex multi-component alloy systems. Macro and mesoscale approaches, such as CALculation of PHAse Diagrams (CALPHAD) and effective medium theory (EMT), based on thermodynamic or empirical rules, offer rapid thermal conductivity predictions but with limited accuracy and applicability.

Recent years have witnessed the emergence of data-driven machine learning approaches as novel tools for materials property prediction. These methods can autonomously learn structure-property relationships from vast experimental and computational datasets, facilitating rapid screening and optimization of new materials[15-18]. Machine learning enables materials screening at computational costs several orders of magnitude lower than DFT methods[19-22]. Traditional alloy property prediction models typically use chemical composition as input features[20,23,24]. However, this approach neglects various factors including element interactions, microstructure, and heat treatment effects, resulting in significant accuracy degradation when predicting novel alloy systems.

Researchers have begun incorporating physical information from DFT and CALPHAD methods into machine learning models to enhance generalization capability[25-27]. In 2022, Chen et al. at the University of California developed a machine learning model for predicting antiphase boundary energy in Ni3Al-based alloys using crystal structure descriptors obtained from DFT calculations[28]. In 2020, Liu et al. at Shanghai University reduced prediction errors by 2.9% in forecasting nickel-based single crystal superalloy creep rupture life by incorporating domain knowledge descriptors such as lattice parameters, phase fractions, and diffusion coefficients from CALPHAD calculations[29]. In 2021, Zou et al. at Northwestern Polytechnical University constructed a machine learning model for discovering high-strength, high-toughness titanium alloys by incorporating electronic property descriptors from DFT calculations and solute-matrix interactions from CALPHAD computations[30]. Currently, machine learning applications in studying Mg alloy thermal conductivity remain limited. There is an urgent need to systematically investigate the relationships between alloying elements, thermodynamic properties, electronic structure features, and thermal conductivity from a multiscale perspective to guide the design of high thermal conductivity Mg alloys.

This study focuses on multi-component, multiphase Mg alloy systems, systematically collecting 1,139 thermal conductivity experimental data points and constructing a domain knowledge-based multiscale high-dimensional feature space encompassing elemental characteristics, thermodynamic properties, and electronic structure. Based on this framework, we developed machine learning models for predicting Mg alloy thermal conductivity and revealed the influence mechanisms of different features through feature selection and model interpretability analysis. Results demonstrate that by integrating cross-scale physical features, the machine learning model accurately predicts thermal conductivity in low-component ternary and simpler Mg alloys with a mean absolute percentage error (MAPE) of 2.16%. Furthermore, incorporating L1 and L2 regularization effectively improves the model’s extrapolation capability for quaternary and higher-order novel systems, maintaining MAPEs below 13.60%, providing new insights for accelerating the development of next-generation high thermal conductivity Mg alloys.

MATERIALS AND METHODS

Dataset construction

A comprehensive dataset comprising 1,139 thermal conductivity measurements from as-cast Mg alloys was systematically compiled, with detailed collection methodology described in Supplementary Materials. The thermal conductivity values in the dataset span from 8.1 to 167.0 W/(m·K), covering a wide range of commercial and experimental Mg alloys. Detailed statistical analysis of the thermal conductivity distribution and alloying element distribution as functions of the number of components is provided in Supplementary Figure 1A-C. The dataset encompasses 52 distinct alloy systems with 332 unique compositions. Of these, low-component systems (encompassing pure elements, binary, and ternary alloys) constitute 955 data points, while high-component systems (quaternary and more complex compositions) account for 184 data points, representing 16.2% of the total dataset. The distribution of data points across different component numbers is detailed in Supplementary Figure 1D, showing a decreasing trend in data availability as the number of components increases.

Feature engineering

The properties of Mg alloys are intrinsically linked to the inherent characteristics of their constituent elements. Through quantitative compositional analysis, a comprehensive set of features reflecting elemental properties was established using both compositional averages and standard deviations. For a multi-component alloy system with n elements, the mean value (μ) and standard deviation (σ) of each elemental property were calculated as:

$$ \mu =\sum C_iX_i $$

$$ \sigma =\sqrt{\sum C_i(X_i-\mu)^2} $$

where Ci represents the atomic fraction of element i, and Xi represents the corresponding elemental property. These features encompass atomic properties, electronic structure characteristics, thermodynamic properties, and physical parameters[31], totaling 45 distinct descriptors as detailed in Supplementary Table 1.

Beyond elemental characteristics, thermodynamic parameters derived from computational methods serve as mesoscale features for predicting alloy performance. Phase diagram characteristics, enthalpy, entropy, and Gibbs free energy (G) significantly influence the microstructural evolution and macroscopic properties of alloys. Using the CALPHAD approach combined with the Scheil-Gulliver model[32], detailed thermodynamic information for as-cast Mg alloys was obtained. These thermodynamic calculations were conducted using Pandat software with the PanMg database, a specialized commercial database developed for multi-component Mg alloy systems[33]. The resulting mesoscale features, documented in Supplementary Table 2, establish quantitative composition-structure-property relationships. The calculated phase fractions and phase types were subsequently utilized in DFT modeling to investigate fundamental physical properties, including electronic structures of individual phases.

For metallic systems, DFT self-consistent calculations provide ground-state electron density distributions, enabling the determination of electronic band structures and density of states. By combining DFT-calculated electronic band structures with the Boltzmann transport equation, quantitative predictions of electronic transport properties were obtained. Electronic structure calculations were performed using the Vienna Ab-initio Simulation Package (VASP) with Perdew–Burke–Ernzerhof (PBE) functional and Projector-Augmented Wave (PAW) pseudopotentials. The Brillouin zone was sampled using an equivalent 20 × 20 × 20 Monkhorst-Pack k-point mesh, with a plane-wave cutoff energy of 400 eV. The energy convergence criterion was set to 10-5 eV and force convergence criterion was 0.01 eV/Å[34-36]. The electronic transport properties were calculated using the BoltzTraP2 package[37], yielding the ratio of electronic thermal conductivity/electrical conductivity to relaxation time for various Mg alloy compositions. While absolute values of electronic thermal conductivity require consideration of phonon scattering and defect interactions, precise calculations of these scattering processes demand substantial computational resources (typically > 10,000 CPU cores). Therefore, this study employs the ratio of electronic thermal conductivity/electrical conductivity to relaxation time (κe/τ and σ/τ) as a machine learning input feature, providing reliable first-principles physical information at reasonable computational cost. The resulting microscale features are documented in Supplementary Table 3. All calculations were performed on a high-performance computing cluster, with detailed computational resource requirements provided in Supplementary Materials.

To unify the diverse scales of different physical properties, all features were standardized using the StandardScaler method:

$$ X^{'}=(X-\mu) / \sigma $$

where X represents the original feature values, μ is the mean, and σ is the standard deviation of each feature. The scaling parameters were calculated using only the training data to prevent data leakage, and these same parameters were then applied to transform the test set features. To identify the optimal feature subset for thermal conductivity prediction, the sequential forward floating selection (SFFS) algorithm[38] was employed with random forest as the evaluator[39,40]. Starting with an empty feature set, SFFS iteratively adds features that maximize model performance. After each forward step, it performs backward steps to remove features whose exclusion might improve performance. This process continues until the selected feature subset achieves optimal predictive capability.

Machine learning framework

Machine learning is a data-driven modeling approach that enables predictions and decision-making by learning latent patterns from large datasets. To comprehensively explore the complex non-linear relationships inherent in Mg alloy thermal conductivity data, this study employs a diverse range of machine learning algorithms. These include linear models (linear regression, ridge regression, LASSO regression, and elastic net), tree-based methods (decision trees, random forest, gradient boosting trees, CatBoost, LightGBM, and XGBoost), kernel methods [support vector regression (SVR)], nearest neighbor methods [K-nearest neighbors (KNNs)], probabilistic models (Bayesian ridge regression), and artificial neural networks (ANNs) from deep learning. Table 1 summarizes the complete names and abbreviations of all algorithms.

Table 1

Machine learning algorithms and their abbreviations used in this study

AlgorithmAbbreviationAlgorithmAbbreviation
Linear regressionLRXGBoostXGB
Ridge regressionRRLightGBMLGBM
LASSO regressionLASSOCatBoostCB
Elastic netENBayesian linear regressionBLR
Decision treeDTBayesian ridge regressionBRR
Random forestRFGaussian process regressionGPR
Gradient boosting machineGBMK-nearest neighborsKNN
Support vector regressionSVRArtificial neural networkANN

To enhance the predictive performance of these machine learning models, an automated hyperparameter optimization strategy based on Bayesian optimization with 10-fold cross-validation was implemented during the training process. MAPE was chosen as the scoring metric for model evaluation due to its effectiveness in capturing relative prediction errors across different thermal conductivity scales[41]. To evaluate model performance and ensure robust predictive capability, the dataset was strategically partitioned based on alloy composition complexity. The 955 data points from low-component systems (pure elements, binary, and ternary alloys) were designated as the training set, while the remaining 184 data points from high-component systems (quaternary and higher-order alloys, comprising 16.2% of the total dataset) served as the test set. This structured split was specifically designed to simulate real-world scenarios where data from simpler systems is used to predict properties of newly developed complex alloys. The proportion of high-component test data (16.2%) approximates the conventional 80-20 split commonly employed in machine learning studies, providing sufficient data for reliable model evaluation while maintaining a robust training set. Analysis of the compositional space coverage revealed that both thermal conductivity values and alloying element concentrations in the high-component test set fall within the ranges established by the low-component training set, as shown in Supplementary Figure 1, ensuring representative sampling across the full composition space.

To enhance model generalization capability and prevent overfitting, both L1 and L2 regularization terms were incorporated into the machine learning model training. The L1 regularization (LASSO) and L2 regularization (ridge) terms were explored within a range from 1E-3 to 1E4. The optimal regularization parameters were determined through grid search with cross-validation to balance model complexity and predictive performance.

RESULTS AND DISCUSSION

Multiscale feature analysis

Mesoscale features based on CALPHAD calculations

To investigate the correlation between thermodynamic parameters and thermal conductivity in Mg alloys, this study calculated various thermodynamic parameters at testing temperatures (T) using the Scheil-Gulliver non-equilibrium solidification model, as illustrated in Figure 1A-F. The results demonstrate that enthalpy and G exhibit monotonic variations with temperature, while compositional changes within the same alloy system have relatively limited effects on these parameters. For instance, when the Al content increases from 0.5 at.% to 1.7 at.%, the variations in alloy enthalpy (H) and G remain below 1.0%. This limited variation primarily results from the consistent phase constitution [hexagonal close-packed (HCP) + Mg17Al12] during non-equilibrium solidification within this composition range.

Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation

Figure 1. Relationships between thermodynamic properties, thermal conductivity, and temperature in Mg alloys. (A and B) Enthalpy; (C and D) Entropy; (E and F) Gibbs free energy. Left panels (A, C, E) show the complete dataset, while right panels (B, D, F) present typical binary systems.

Unlike enthalpy, entropy shows relatively low sensitivity to changes in alloy composition. As shown in Figure 1D, the distributions of Mg-Al and Mg-Zn alloys in the temperature-entropy-thermal conductivity space nearly overlap completely. This suggests that entropy cannot effectively differentiate the impacts of different alloying elements and their concentration on material properties within the studied systems. Considering the fundamental thermodynamic equation G = H - TS, the influence of alloying elements on G primarily stems from enthalpic contributions, resulting in similar correlations with thermal conductivity as observed for enthalpy.

To gain deeper insights into the influence of alloy composition on microstructural evolution, we systematically calculated the phase constitution and molar fractions across the complete dataset. Taking Mg-Al and Mg-Zn binary systems as representative examples, Figure 2A shows that for Mg-Al alloys, the microstructure consists of HCP-Mg matrix and Mg17Al12 intermetallic compound when the Al content is below 14.2 at.%. As Al content increases, the molar fraction of the Mg17Al12 phase increases non-linearly, reaching 22.0% at 14.2 at.% Al. The increased Mg17Al12 phase fraction leads to enhanced interfacial thermal resistance and lattice distortion at phase boundaries, impeding heat transfer across interfaces. Consequently, the thermal conductivity decreases from 131.2 to 41.0 W/(m·K), as shown in Figure 2B. Similarly, Mg-Zn alloys primarily consist of HCP-Mg matrix and Mg5Zn2 intermetallic compound, with phase fractions varying linearly with alloying element content within the studied composition range (0.2-9.4 at.% Zn).

Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation

Figure 2. Effect of alloying element content on non-equilibrium phase constitution and thermal conductivity in typical binary alloys. (A) Phase fraction; (B) Thermal conductivity.

In conclusion, thermodynamic properties serve as crucial mesoscale features for constructing predictive models of thermal properties in Mg alloys. The significant correlation between enthalpy and thermal conductivity reflects the regulatory effects of composition and temperature on alloy microstructure and interatomic interactions. Furthermore, thermodynamic properties reveal important information about thermal stability, phase transformation behavior, and phase constitution, establishing intrinsic connections between composition, structure, and properties.

Microscale features based on DFT calculations

DFT calculations were employed to systematically investigate the crystal and electronic structures of all Mg alloy phases in the dataset. To validate the computational reliability, lattice constants of pure HCP-Mg, FCC-Al, and HCP-Zn were compared with experimental and computational values from literature, as shown in Supplementary Table 4. All calculated values deviated less than 5% from literature data, confirming the reliability of our computational methodology and parameters.

Electronic density of states distributions was obtained through non-self-consistent calculations of band structures and density of states. The thermal conductivity properties could be qualitatively predicted by analyzing electronic structure features near the Fermi level. Figure 3 illustrates the crystal structures and electronic density of states for two representative phases (HCP-Mg and Mg17Al12). For the HCP-Mg phase, electronic states near the Fermi level primarily originate from Mg 3s and 3p orbitals, with a density of states reaching 0.43 states/(eV·atom), indicating good metallic characteristics. Upon alloying with elements such as Al, significant changes occur in the crystal structure. Taking the Mg17Al12 intermetallic compound as an example, the interaction between Al 3s/3p and Mg 3s/3p orbitals leads to valence band broadening and a reduction in density of states near the Fermi level by 0.10 states/(eV·atom), exhibiting typical intermetallic electronic features that result in decreased thermal conductivity.

Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation

Figure 3. Crystal structures and electronic structure characteristics of typical phases in Mg alloys. (A) HCP-Mg; (B) Mg17Al12. HCP: Hexagonal close-packed.

Following electronic structure analysis, thermal and electrical transport properties were calculated using the Boltzmann transport equation with relaxation time approximation. To validate the methodology, electronic thermal conductivity and electrical conductivity were calculated for pure Mg, Al, and Zn systems and compared with theoretical literature values, as summarized in Supplementary Tables 5 and 6. The calculated electronic thermal conductivities for Mg, Al, and Zn show excellent agreement with literature values[42] across the temperature range of 300-700 K, with average deviations remaining within 1.9%.

Figure 4 presents calculated κe/τ and σ/τ values for 36 Mg alloy phases in the dataset. These transport parameters vary by more than two orders of magnitude across different phases, demonstrating high sensitivity of thermal/electrical properties to chemical composition and microstructure. HCP-Mg phase exhibits the highest κe/τ and σ/τ, consistent with superior electrical and thermal conductivity of pure Mg in practical applications. With increased alloying, κe/τ and σ/τ gradually decrease due to enhanced lattice distortion and electron scattering. Moreover, κe/τ and σ/τ show highly consistent variation patterns, with linear fitting yielding a Lorenz number of 2.41 × 10-8 W·Ω/K2, validating the Wiedemann-Franz law in Mg alloy systems.

Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation

Figure 4. The ratios of electronic thermal conductivity to relaxation time (κe/τ) and electrical conductivity to relaxation time (σ/τ) for Mg alloy phases in the dataset.

In conclusion, the calculated microscale parameters, including lattice constants, electronic density of states distribution, and Fermi level position, serve as crucial microscale features for constructing predictive models of Mg alloy properties. The relaxation-time-independent parameters κe/τ and σ/τ quantitatively characterize the intrinsic electrical and thermal conduction capabilities of different phases.

Optimal feature subset selection

To develop a high-performance machine learning model for predicting thermal conductivity in Mg alloys, we employed the SFFS algorithm for feature selection. Random Forest was chosen as the evaluator during feature selection, with MAPE as the performance metric. The selection process was conducted on three distinct feature datasets to systematically evaluate the importance and complementarity of features across different scales and physical mechanisms: (1) features containing only elemental properties; (2) features combining elemental properties and CALPHAD calculations (including the thermodynamic properties analyzed in Figures 1 and 2); and (3) a complete feature set further incorporating DFT calculations (including the electronic structure parameters examined in Figures 3 and 4).

As shown in Figure 5A and B, the selection results across the three feature datasets reveal that in the elemental properties subset (four features), the importance ranking is: atomic radius difference (γr), T, mean cohesive energy (EC), and standard deviation of valence electron concentration (Vs). After incorporating thermodynamic features, the optimal subset (six features) includes additional thermodynamic parameters reflecting phase equilibrium and stability, ranked by importance as: γr, H, EC, mean atomic mass (W), G, and mean shear modulus (SM). With the further addition of first-principles features, the optimal subset (six features) incorporates parameters reflecting atomic-scale structure and electronic properties, ranked as: γr, H, EC, W, ratio of electronic thermal conductivity to relaxation time (κe/τ), and SM. Notably, the addition of thermodynamic features resulted in H replacing testing T, confirming our findings from Section “Mesoscale features based on CALPHAD calculations” that H encompasses both temperature and elemental information. When first-principles features were introduced, κe/τ replaced G, as G is composed of H and S, and H was already included as a thermodynamic feature, making κe/τ more informationally rich.

Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation

Figure 5. Feature selection results of thermal conductivity dataset using SFFS and Kneed algorithms. (A) MAPE variation with the number of features; (B) Normalized distance values versus number of features in Kneed algorithm; (C-E) Spearman correlation coefficient matrices of optimal feature subsets. SFFS: Sequential forward floating selection; MAPE: mean absolute percentage error.

To further analyze the correlation between selected features, Spearman correlation analysis was performed on the feature subsets. As shown in Figure 5C-E, correlation coefficients between features within all three subsets remain below 0.8, indicating low correlation and minimal redundancy. This result validates the effectiveness of the SFFS algorithm in feature selection, demonstrating that the selected feature subsets possess good complementarity and information richness, providing high-quality inputs for subsequent machine learning modeling.

Machine learning-based thermal conductivity prediction

Comparison of model prediction performance

Following optimal feature subset selection, various machine learning algorithms were employed to construct thermal conductivity prediction models for low-component Mg alloys, with model performance evaluated through 10-fold cross-validation to obtain a model with strong generalization ability and high prediction accuracy. Figure 6 presents the MAPE and root mean square error (RMSE) results for each model with optimized hyperparameters. XGBoost consistently outperformed other models across all three feature subsets, achieving MAPE values below 2.50%, demonstrating its effectiveness in handling non-linear relationships in thermal conductivity prediction.

Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation

Figure 6. Comparison of 10-fold cross-validation prediction performance for different machine learning models across three feature subsets. (A) MAPE; (B) RMSE. MAPE: Mean absolute percentage error; RMSE: root mean square error.

The machine learning models exhibited distinct performance stratification in terms of MAPE, roughly dividing into three categories. Linear models, represented by LR, LASSO and EN, typically showed MAPE values between 20%-30%, indicating their limitation in capturing non-linear relationships. Tree-based models, including DT, RF, and XGB, achieved significantly lower MAPE values in the 2%-5% range, consistently outperforming other model types. The third category, including SVR, KNN, and ANN, showed intermediate performance with MAPE values between 5%-20%. While these models consider non-linear factors, their prediction performance and stability were inferior to tree-based models.

For low-component systems, expanding the feature subset showed limited performance improvement in cross-validation. When the feature subset was expanded from elemental properties to include CALPHAD and DFT calculations, XGBoost achieved a MAPE of 2.32% with elemental properties alone, improving only marginally to 2.27% and 2.16% with thermodynamic and first-principles features, respectively. Although basic elemental descriptors adequately characterize the thermal conductivity behavior in binary and ternary systems, multiscale features reduce model over-reliance on specific key descriptors and result in more consistent performance across different data subsets. The standard deviation of RMSE for XGBoost decreased from 1.17 to 0.86 W/(m·K) and 0.90 W/(m·K) with the expanded feature sets, indicating that additional physical features provide complementary information about material thermal conductivity. This is particularly important for high-component systems, where complex interactions among multiple alloying elements necessitate additional features to capture the underlying physical mechanisms affecting thermal conductivity.

Model interpretability analysis

To further analyze error sources and interpret the machine learning model, we focused on the XGBoost model, which demonstrated the highest prediction accuracy. Figure 7A-C shows scatter plots comparing predicted values against experimental thermal conductivity across three feature subsets. Across all thermal conductivity ranges, over 80% of the data points showed prediction errors below 10%; notably, in the medium thermal conductivity range [100.0-150.0 W/(m·K)], over 95% of predictions had errors below 10%. The balanced distribution of positive (overestimation) and negative (underestimation) deviations suggests that prediction errors are primarily dominated by random noise rather than systematic bias, confirming the robustness of XGBoost against outliers and noise in practical thermal conductivity prediction tasks.

Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation

Figure 7. XGBoost model performance and SHAP analysis for three feature subsets. (A-C) Predicted versus experimental thermal conductivity; SHAP analysis results in (D-F) mean absolute SHAP values for each feature; (G-I) SHAP beeswarm plots. SHAP: Shapley Additive Explanations.

The internal mechanisms of the XGBoost model were revealed through Shapley Additive Explanations (SHAP) analysis. Figure 7D-F presents the mean absolute SHAP values through bar plots, quantitatively demonstrating the importance of each feature. The EC exhibited more than twice the mean absolute SHAP values compared to other features, highlighting its dominant influence on model outputs. Additionally, γr and W demonstrated significant mean absolute SHAP values, suggesting their substantial impact on thermal conductivity predictions. Figure 7G-I utilizes SHAP beeswarm plots to visualize the direction of feature effects through color coding. EC exhibits a clear negative correlation with Mg alloy thermal conductivity - as EC increases (indicated by warmer colors), the horizontal position (SHAP value) shifts leftward, indicating decreasing predicted thermal conductivity. Furthermore, Vs, W, SM, and alloy G all demonstrate negative correlations with Mg alloy thermal conductivity.

Through detailed error analysis and interpretability study of the XGBoost model, we have thoroughly investigated the performance characteristics and internal mechanisms of the thermal conductivity prediction model for low-component Mg alloys. Results demonstrate that the XGBoost model exhibits excellent prediction performance and stability across all thermal conductivity ranges, with EC emerging as one of the most crucial factors affecting Mg alloy thermal conductivity. Moreover, γr, W, and T were confirmed as significant features influencing thermal conductivity prediction. Based on these SHAP analysis results, key guidelines for designing high thermal conductivity Mg alloys can be derived. For example, the strong negative correlation between EC and thermal conductivity suggests that alloying elements resulting in lower system cohesive energy should be prioritized. These quantitative relationships provide practical guidance for the rational design of new high thermal conductivity Mg alloys.

Model extrapolation optimization

A key limitation of machine learning models is their restricted extrapolation capability, particularly in predicting properties of new compositions or service conditions beyond the training data range. To evaluate the extrapolation capabilities of machine learning models in predicting thermal conductivity of high-component Mg alloys, we trained the XGBoost model on a complete low-component dataset (pure elements, binary, and ternary systems) and tested it on quaternary and higher-order Mg alloy datasets. This evaluation approach simulates real applications where known low-component alloy data is used to predict properties of newly developed high-component alloys.

As shown in Figure 8, while the XGBoost model achieved R2 values above 0.99 on all train sets, it performed poorly on test sets with R2 values below 0.50, suggesting potential overfitting issues. However, despite the low R2 values, incorporating CALPHAD features effectively reduces the MAPE from 17.21% to 14.38% compared to models using only elemental properties. Further error analysis revealed that high-component alloy thermal conductivities distributed within 50.0-150.0 W/(m·K), with systematic overestimation below 100.0 W/(m·K). In this region, 76% of data points showed overestimation errors exceeding 10%, with 34% exceeding 30% error.

Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation

Figure 8. Scatter plots comparing XGBoost predictions with experimental values across three feature subsets. (A-C) Low-component train sets; (D-F) High-component test sets.

The data density color maps in Figure 8A-C reveal that this decreased prediction accuracy can be largely attributed to the sparse distribution of training data from low-component systems in regions below 100.0 W/(m·K). Moreover, the component number color maps in Figure 8D-F demonstrate that prediction errors systematically increase with the number of alloying elements, indicating that the complexity of high-component systems poses additional challenges for model extrapolation. As shown in Supplementary Figure 2, neither temperature nor alloying element content shows significant correlation with prediction errors in high-component systems. These findings suggest that the prediction errors stem from both overfitting due to data sparsity and the inherent complexity of high-component systems.

Various approaches have been proposed to enhance extrapolation capabilities of machine learning models. For instance, hierarchical prediction frameworks have shown success in metal halide perovskites studies, where properties spanning different value ranges were effectively predicted through a multi-level classification strategy[43]. However, the relatively sparse data distribution in high-component Mg alloy systems makes it difficult to establish meaningful hierarchical classifications. In this work, we instead implemented L1 and L2 regularization optimization. Grid search was conducted across the range (1E-3, 1E4) to evaluate model performance under different regularization strength combinations. The baseline used unoptimized regularization parameters of 0.1 for both L1 and L2, as recommended by Bayesian optimization.

The systematic optimization through L1 and L2 regularization demonstrated significant improvements in model extrapolation capability, as shown in Figure 9. The heat maps clearly reveal how different combinations of regularization parameters affect model performance across both low-component training sets and high-component test sets. Optimal prediction performance was achieved with both regularization parameters set to 100, resulting in an average MAPE of 13.9% on high-component test sets, a 5.4% improvement from the unregularized 19.3%. The average MAPE on low-component train sets was 7.3% and maintained acceptable accuracy. Among the three feature subsets, the combination incorporating DFT features showed the most promising results after regularization optimization, achieving the lowest MAPE of 13.6% on high-component test sets, compared to 14.35% for elemental properties alone and 13.85% for the combination incorporating CALPHAD features.

Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation

Figure 9. Heat maps showing XGBoost model performance with L1 and L2 regularization optimization. (A-C) Low-component train sets; (D-F) High-component test sets.

The improved performance can be attributed to the complementary effects of L1 and L2 regularization mechanisms. L1 regularization contributes to model optimization through two primary channels: first, it promotes feature sparsity by effectively zeroing out less important features, thereby reducing model complexity; second, it implements dynamic feature selection during the training process, preventing any single feature from dominating the model’s decisions. Meanwhile, L2 regularization enhances model robustness in multiple ways: it reduces the model’s sensitivity to outliers and noise in the training data by constraining weight magnitudes, facilitates smoother transitions between different feature values, and helps prevent sharp discontinuities in the model’s predictions. This synergistic combination of regularization techniques effectively balances the model’s complexity and generalization ability, particularly crucial when extrapolating to more complex alloy systems where the underlying physical relationships become increasingly sophisticated. The optimized regularization parameters ensure that the incorporated DFT and CALPHAD features contribute meaningful physical insights while preventing overfitting, thereby enabling more reliable predictions for novel high-component Mg alloys.

To further investigate factors affecting model extrapolation capability, different sampling ratios (5%-100%) of the low-component training set were evaluated across three feature subsets. As shown in Supplementary Figure 3, when using only 5% of the training data, the model exhibits poor stability with the highest MAPE of 21.0% and standard deviation of 3.4%. A significant improvement occurs at 20% training data, with MAPE decreasing to 15.7%, suggesting this represents a minimum threshold for effective model training. The prediction performance shows diminishing returns beyond 60% training data, where MAPE reaches 14.1% and only marginally improves to 13.9% with the complete dataset. This indicates that around 570 training points may be sufficient for achieving stable model performance, and future improvements should focus on acquiring targeted data from sparse regions rather than simply expanding the dataset size.

CONCLUSIONS

This study systematically collected 1,139 thermal conductivity experimental data points for multi-component, multiphase Mg alloy systems and constructed a domain knowledge-based, multiscale high-dimensional feature space incorporating elemental properties, thermodynamic properties, and electronic structure characteristics. Machine learning algorithms were employed to develop thermal conductivity prediction models for Mg alloys. Through feature selection and model interpretability analysis, we revealed the influence mechanisms of different features on Mg alloy thermal conductivity, providing theoretical guidance for the design and optimization of high thermal conductivity Mg alloys. The main conclusions are as follows:

1. Multiscale computed features, including atomic radius differences, enthalpy, cohesive energy, and the κe/τ, demonstrated significant contributions to predicting Mg alloy thermal conductivity. By integrating these key features, the XGBoost model accurately predicted thermal conductivity for low-component Mg alloys (up to ternary systems) with a MAPE better than 2.16%.

2. The incorporation of L1 and L2 regularization effectively enhanced the machine learning model’s extrapolation capability for new quaternary and higher-order systems. The regularization-optimized XGBoost model achieved a MAPE of 13.60% on the high-component alloy test set while maintaining good fitting accuracy on the low-component alloy training set (MAPE of 7.30%).

DECLARATIONS

Authors’ contributions

Conducted the main research work including machine learning model development, data processing, algorithm implementation and analysis: Chen, J.

Assisted in manuscript writing: Zhang, Y.

Responsible for manuscript proofreading: Luan, J.

Provided guidance on the editing and funding support: Fan, Y.

Provided guidance on DFT calculations and funding support: Liu, B.

Provided guidance on CALPHAD calculations and funding support: Chou, K.

Conceived and supervised the project and was responsible for manuscript review and editing: Yu, Z.

All authors have read and approved the final manuscript.

Availability of data and materials

All research data were obtained from published literature and related technical manuals. To support research reproducibility, the implementation codes, trained machine learning models, and a curated subset of representative data points are publicly available at https://github.com/Mat-Design-Yu/MgAlloy-ThermalCond-ML. While the complete database cannot be publicly shared due to ongoing research projects, it may be available from the corresponding author upon reasonable request and signing of a data usage agreement.

Financial support and sponsorship

This work was supported by the National Key Research and Development Program of China (No. 2021YFB3701001), the National Natural Science Foundation of China (Nos. 52274301, 52104306, 52334009), the Aeronautical Science Foundation of China (No. 2023Z0530S6005), the National Key Research and Development Program of China (No. 2023YFB3712401), the Science and Technology Commission of Shanghai Municipality (No. 21DZ1208900), Academician Workstation of Kunming University of Science and Technology (2024), Ningbo Yongjiang Talent-Introduction Programme (No. 2022A-023-C) and Zhejiang Phenomenological Materials Technology Co., Ltd., China.

Conflicts of interest

All authors declared that there are no conflicts of interest.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Copyright

© The Author(s) 2025.

Supplementary Materials

REFERENCES

1. Pan, H.; Pan, F.; Yang, R.; et al. Thermal and electrical conductivity of binary magnesium alloys. J. Mater. Sci. 2014, 49, 3107-24.

2. Li, S.; Yang, X.; Hou, J.; Du, W. A review on thermal conductivity of magnesium and its alloys. J. Magnes. Alloys. 2020, 8, 78-90.

3. Bai, J.; Yang, Y.; Wen, C.; et al. Applications of magnesium alloys for aerospace: a review. J. Magnes. Alloys. 2023, 11, 3609-19.

4. Zhang, W.; Ma, M.; Yuan, J.; et al. Microstructure and thermophysical properties of Mg−2Zn−xCu alloys. Trans. Nonferrous. Met. Soc. China. 2020, 30, 1803-15.

5. Li, G.; Zhang, J.; Wu, R.; et al. Development of high mechanical properties and moderate thermal conductivity cast Mg alloy with multiple RE via heat treatment. J. Mater. Sci. Technol. 2018, 34, 1076-84.

6. Bazhenov, V.; Koltygin, A.; Sung, M.; et al. Development of Mg–Zn–Y–Zr casting magnesium alloy with high thermal conductivity. J. Magnes. Alloys. 2021, 9, 1567-77.

7. Rong, J.; Zhu, J.; Xiao, W.; Zhao, X.; Ma, C. A high pressure die cast magnesium alloy with superior thermal conductivity and high strength. Intermetallics 2021, 139, 107350.

8. Liu, X.; Wu, Y.; Liu, Z.; Lu, C.; Xie, H.; Li, J. Thermal and electrical conductivity of as-cast Mg-4Y-xZn alloys. Mater. Res. Express. 2018, 5, 066532.

9. Rudajevová, A.; Staněk, M.; Lukáč, P. Determination of thermal diffusivity and thermal conductivity of Mg–Al alloys. Mater. Sci. Eng. A. 2003, 341, 152-7.

10. Yuan, G.; You, G.; Bai, S.; Guo, W. Effects of heat treatment on the thermal properties of AZ91D magnesium alloys in different casting processes. J. Alloys. Compd. 2018, 766, 410-6.

11. Sharma, P.; Johnson, D. D.; Balasubramanian, G.; Singh, P. Unraveling the connection of electronic and phononic structure with mechanical properties of commercial AZ80 alloy. Mater. Lett. 2024, 366, 136501.

12. Tong, Z.; Li, S.; Ruan, X.; Bao, H. Comprehensive first-principles analysis of phonon thermal conductivity and electron-phonon coupling in different metals. Phys. Rev. B. 2019, 100, 144306.

13. Zhou, S.; Jacobs, R.; Xie, W.; Tea, E.; Hin, C.; Morgan, D. Combined ab initio and empirical model of the thermal conductivity of uranium, uranium-zirconium, and uranium-molybdenum. Phys. Rev. Mater. 2018, 2, 083401.

14. Hu, M.; Yang, Z. Perspective on multi-scale simulation of thermal transport in solids and interfaces. Phys. Chem. Chem. Phys. 2021, 23, 1785-801.

15. Fang, J.; Xie, M.; He, X.; et al. Machine learning accelerates the materials discovery. Mater. Today. Commun. 2022, 33, 104900.

16. Juan, Y.; Niu, G.; Yang, Y.; et al. Accelerated design of Al−Zn−Mg−Cu alloys via machine learning. Trans. Nonferrous. Met. Soc. China. 2024, 34, 709-23.

17. Yuan, Y.; Sui, Y.; Li, P.; Quan, M.; Zhou, H.; Jiang, A. Multi-model integration accelerates Al-Zn-Mg-Cu alloy screening. J. Mater. Inf. 2024, 4, 23.

18. Lu, Z.; Kapoor, I.; Li, Y.; Liu, Y.; Zeng, X.; Wang, L. Machine learning driven design of high-performance Al alloys. J. Mater. Inf. 2024, 4, 19.

19. Sutton, C.; Boley, M.; Ghiringhelli, L. M.; Rupp, M.; Vreeken, J.; Scheffler, M. Identifying domains of applicability of machine learning models for materials science. Nat. Commun. 2020, 11, 4428.

20. Schleder, G. R.; Padilha, A. C. M.; Acosta, C. M.; Costa, M.; Fazzio, A. From DFT to machine learning: recent approaches to materials science - a review. J. Phys. Mater. 2019, 2, 032001.

21. Pederson, R.; Kalita, B.; Burke, K. Machine learning and density functional theory. Nat. Rev. Phys. 2022, 4, 357-8.

22. Xi, S.; Yu, J.; Bao, L.; et al. Machine learning-accelerated first-principles predictions of the stability and mechanical properties of L12-strengthened cobalt-based superalloys. J. Mater. Inf. 2022, 2, 15.

23. Yadav, N.; Chakraborty, N.; Tewari, A. Interval prediction machine learning models for predicting experimental thermal conductivity of high entropy alloys. Comput. Mater. Sci. 2022, 214, 111754.

24. Hart, G. L. W.; Mueller, T.; Toher, C.; Curtarolo, S. Machine learning for alloys. Nat. Rev. Mater. 2021, 6, 730-55.

25. Huang, L.; Liu, S.; Du, Y.; Zhang, C. Thermal conductivity of the Mg–Al–Zn alloys: experimental measurement and CALPHAD modeling. Calphad 2018, 62, 99-108.

26. Li, X.; Zheng, M.; Pan, H.; Mao, C.; Ding, W. An integrated design of novel RAFM steels with targeted microstructures and tensile properties using machine learning and CALPHAD. J. Mater. Inf. 2024, 4, 27.

27. Chen, Z.; Yang, Y. Data-driven design of eutectic high entropy alloys. J. Mater. Inf. 2023, 3, 10.

28. Chen, E.; Tamm, A.; Wang, T.; Epler, M. E.; Asta, M.; Frolov, T. Modeling antiphase boundary energies of Ni3Al-based alloys using automated density functional theory and machine learning. npj. Comput. Mater. 2022, 8, 755.

29. Liu, Y.; Wu, J.; Wang, Z.; et al. Predicting creep rupture life of Ni-based single crystal superalloys using divide-and-conquer approach based machine learning. Acta. Mater. 2020, 195, 454-67.

30. Zou, C.; Li, J.; Wang, W. Y.; et al. Integrating data mining and machine learning to discover high-strength ductile titanium alloys. Acta. Mater. 2021, 202, 211-21.

31. Wen, C.; Zhang, Y.; Wang, C.; et al. Machine learning assisted design of high entropy alloys with desired property. Acta. Mater. 2019, 170, 109-17.

32. Schaffnit, P.; Stallybrass, C.; Konrad, J.; Stein, F.; Weinberg, M. A Scheil–Gulliver model dedicated to the solidification of steel. Calphad 2015, 48, 184-8.

33. Schmid-Fetzer, R.; Zhang, F. The light alloy Calphad databases PanAl and PanMg. Calphad 2018, 61, 246-63.

34. Li, Z.; Hu, B.; Yao, F.; et al. Anomalous increase in thermal conductivity of Mg solid solutions by co-doping with two solute elements. Acta. Mater. 2025, 285, 120708.

35. Wang, D.; Amsler, M.; Hegde, V. I.; et al. Crystal structure, energetics, and phase stability of strengthening precipitates in Mg alloys: a first-principles study. Acta. Mater. 2018, 158, 65-78.

36. Wang, S.; Zhao, Y.; Guo, H.; Lan, F.; Hou, H. Mechanical and thermal conductivity properties of enhanced phases in Mg-Zn-Zr system from first principles. Materials 2018, 11, 2010.

37. Madsen, G. K.; Carrete, J.; Verstraete, M. J. BoltzTraP2, a program for interpolating band structures and calculating semi-classical transport coefficients. Comput. Phys. Commun. 2018, 231, 140-5.

38. Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16-28.

39. Jaiswal, J. K.; Samikannu, R. Application of random forest algorithm on feature subset selection and classification and regression. In: 2017 World Congress on Computing and Communication Technologies (WCCCT), Tiruchirappalli, India, 02-04 Feb, 2017. IEEE, 2017; pp. 65-8.

40. Dong, S.; Wang, Y.; Li, J.; Li, Y.; Wang, L.; Zhang, J. Machine learning aided prediction and design for the mechanical properties of magnesium alloys. Met. Mater. Int. 2024, 30, 593-606.

41. de Myttenaere, A.; Golden, B.; Le Grand, B.; Rossi, F. Mean absolute percentage error for regression models. Neurocomputing 2016, 192, 38-48.

42. Cui, Y.; Li, S.; Ying, T.; Bao, H.; Zeng, X. Research on the thermal conductivity of metals based on first principles. Acta. Metall. Sin. 2021, 57, 375-84.

43. Saidi, W. A.; Shadid, W.; Castelli, I. E. Machine-learning structural and electronic properties of metal halide perovskites using a hierarchical convolutional neural network. npj. Comput. Mater. 2020, 6, 307.

Cite This Article

Research Article
Open Access
Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation
Junwei Chen, ... Kuochih Chou

How to Cite

Chen, J.; Zhang, Y.; Luan, J.; Fan, Y.; Yu, Z.; Liu, B.; Chou, K. Prediction of thermal conductivity in multi-component magnesium alloys based on machine learning and multiscale computation. J. Mater. Inf. 2025, 5, 22. http://dx.doi.org/10.20517/jmi.2024.89

Download Citation

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click on download.

Export Citation File:

Type of Import

Tips on Downloading Citation

This feature enables you to download the bibliographic information (also called citation data, header data, or metadata) for the articles on our site.

Citation Manager File Format

Use the radio buttons to choose how to format the bibliographic data you're harvesting. Several citation manager formats are available, including EndNote and BibTex.

Type of Import

If you have citation management software installed on your computer your Web browser should be able to import metadata directly into your reference database.

Direct Import: When the Direct Import option is selected (the default state), a dialogue box will give you the option to Save or Open the downloaded citation data. Choosing Open will either launch your citation manager or give you a choice of applications with which to use the metadata. The Save option saves the file locally for later use.

Indirect Import: When the Indirect Import option is selected, the metadata is displayed and may be copied and pasted as needed.

About This Article

Special Issue

© The Author(s) 2025. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Data & Comments

Data

Views
82
Downloads
3
Citations
0
Comments
0
0

Comments

Comments must be written in English. Spam, offensive content, impersonation, and private information will not be permitted. If any comment is reported and identified as inappropriate content by OAE staff, the comment will be removed without notice. If you have any queries or need any help, please contact us at [email protected].

0
Download PDF
Share This Article
Scan the QR code for reading!
See Updates
Contents
Figures
Related
Journal of Materials Informatics
ISSN 2770-372X (Online)
Follow Us

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/

Portico

All published articles are preserved here permanently:

https://www.portico.org/publishers/oae/