fig2

Figure 2. Predictive performance (AUC) of machine learning vs. traditional models. Boxplots show the distribution of discrimination scores (AUC) for categorical outcome prediction models, grouped by model type. Plots show median, IQR, and whiskers extending to smallest and largest values within 1.5 × IQR from the quartiles, overlaid with scores from each selected model from included studies. For studies reporting multiple scores for a predictive model, one representative score was selected for the model in the following order of preference by the method of evaluation: external validation, internal validation, and lastly, training data. Machine learning models (N = 14): AUC median [IQR]: 0.825 [0.76, 0.84]. Traditional models (N = 15): AUC median [IQR]: 0.74 [0.71, 0.81]. Total (N = 29): AUC median [IQR]: 0.79 [0.730, 0.84]. AUC: The area under the receiver operating curve, IQR: interquartile range.