Developing an explainable machine learning model to predict false-negative citrin deficiency cases in newborn screening

Orphanet J Rare Dis. 2025 Oct 8;20(1):507. doi: 10.1186/s13023-025-04045-z.

ABSTRACT

BACKGROUND: Neonatal Intrahepatic Cholestasis caused by Citrin Deficiency (NICCD) is an autosomal recessive disorder affecting the urea cycle and energy metabolism. Newborn screening (NBS) usually relies on elevated citrulline, but some patients have normal citrulline, resulting in false negatives and delayed diagnosis. This study develops an explainable machine learning (ML) model to predict false-negative NICCD cases during NBS.

METHODS: Data from 53 false-negative NICCD patients and 212 controls, collected retrospectively between 2011 and 2024, were analyzed. The dataset was split into a training set (70%) and a test set (30%). External validation involved 48 participants from distinct time periods. Key predictors were identified using variable importance in projection (VIP > 1) and Lasso regression. Six ML models were trained for evaluation: Logistic Regression, Random Forest, Light Gradient Boosting Machine, Extreme Gradient Boosting (XGBoost), K-Nearest Neighbor, and Support Vector Machines. Performance was evaluated using the area under the receiver operating characteristic curve (AUC) and F1 score. Shapley Additive exPlanations (SHAP) was applied to determine the importance of features and interpret the models.

RESULTS: Birth weight, citrulline, glycine, phenylalanine, ornithine, arginine, proline, succinylacetone, and C10:2 were selected as predictive features. Among the ML models, XGBoost demonstrated the most robust and consistent performance, achieving AUCs of 0.971(95%CI: 0.959-0.979), 0.968, and 0.977, and F1 scores of 0.786(95% CI: 0.744-0.820), 0.828, and 0.833 in the training, test, and external validation sets, respectively. SHAP analysis showed that the most important features are citrulline, glycine, phenylalanine, succinylacetone, birth weight, and ornithine. Feature pairs such as citrulline-phenylalanine, citrulline-glycine, succinylacetone-birth weight, and ornithine-glycine showed varying interactions. SHAP force plots, decision plots, and waterfall plots provided insightful patient-level interpretations. Finally, we built a network calculator for the prediction of false-negative NICCD cases ( https://myapp123.shinyapps.io/my_shiny_app/ ).

CONCLUSION: An interpretable machine learning model utilizing metabolite and demographic data enhances the detection of false-negative NICCD cases, facilitates early identification and intervention, and ultimately improves the overall effectiveness of the newborn screening system.

PMID:41063287 | DOI:10.1186/s13023-025-04045-z

John Joseph