Machine learning-based prediction algorithm of spontaneous preterm birth using multi-source data
Machine learning-based prediction algorithm of spontaneous preterm birth using multi-source data

Machine learning-based prediction algorithm of spontaneous preterm birth using multi-source data

BMC Pregnancy Childbirth. 2025 Dec 3. doi: 10.1186/s12884-025-08541-9. Online ahead of print.

ABSTRACT

BACKGROUND: Spontaneous preterm birth (sPTB) is a complex condition with unclear etiology, associated with increased neonatal risks. Early prediction of sPTB enables timely interventions to improve outcomes. Our study aimed to construct machine learning (ML) models to predict sPTB using multi-source data, including electronic health records (EHR) and environmental factors.

METHODS: This retrospective cohort study included 54132 singleton pregnancies from Wuhan Children’s Hospital (Wuhan Maternal and Child Healthcare Hospital) between December 2012 and December 2022. We collected multi-source predictors including demographics, routine prenatal tests, air pollution exposure, meteorological factors, and greenness exposure, resulting in a total of 82 predictors. Extreme gradient boosting (XGBoost), random forest (RF), support vector machine (SVM), and logistic regression (LR) models were used to construct predictive models of sPTB. Screening performance was assessed via the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC). Shapley additive explanation (SHAP) value was computed to assess the importance of each feature contributing to the prediction.

RESULTS: The XGBoost model yielded the best performance in the test set with an AUROC of 0.926 and an AUPRC of 0.502. Eosinophils percentage, albumin, uric acid, amniotic fluid pocket, and sulfur dioxide exposure during late pregnancy were identified as the most important predictors of sPTB.

CONCLUSIONS: Our results demonstrate that combining EHR data, environmental factors, and ML methods enables highly accurate and moderately precise predictions of sPTB. While the model shows promising discriminatory power, its precision requires improvement before clinical application.

PMID:41339813 | DOI:10.1186/s12884-025-08541-9