Diagnostic Performance of Machine Learning-based Models in Neonatal Sepsis: A Systematic Review
Diagnostic Performance of Machine Learning-based Models in Neonatal Sepsis: A Systematic Review

Diagnostic Performance of Machine Learning-based Models in Neonatal Sepsis: A Systematic Review

Pediatr Infect Dis J. 2024 Jul 26. doi: 10.1097/INF.0000000000004409. Online ahead of print.

ABSTRACT

BACKGROUND: Timely diagnosis of neonatal sepsis is challenging. We aimed to systematically evaluate the diagnostic performance of sophisticated machine learning (ML) techniques for the prediction of neonatal sepsis.

METHODS: We searched MEDLINE, Embase, Web of Science and Cochrane CENTRAL databases using “neonate,” “sepsis” and “machine learning” as search terms. We included studies that developed or validated an ML algorithm to predict neonatal sepsis. Those incorporating automated vital-sign data were excluded. Among 5008 records, 74 full-text articles were screened. Two reviewers extracted information as per the CHARMS (CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies) checklist. We followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-analyses) guideline extension for diagnostic test accuracy reviews and used the PROBAST tool for risk of bias assessment. Primary outcome was a predictive performance of ML models in terms of sensitivity, specificity and positive and negative predictive values. We generated a hierarchical summary receiver operating characteristics curve for pooled analysis.

RESULTS: Of 19 studies (15,984 participants) with 76 ML models, the random forest algorithm was the most employed. The candidate predictors per model ranged from 5 to 93; most included birth weight and gestation. None performed external validation. The risk of bias was high (18 studies). For the prediction of any sepsis (14 studies), pooled sensitivity was 0.87 (95% credible interval: 0.75-0.94) and specificity was 0.89 (95% credible interval: 0.77-0.95). Pooled area under the receiver operating characteristics curve was 0.94 (95% credible interval: 0.92-0.96). All studies, except one, used data from high- or upper-middle-income countries. With unavailable probability thresholds, the performance could not be assessed with sufficient precision.

CONCLUSIONS: ML techniques have good diagnostic accuracy for neonatal sepsis. The need for the development of context-specific models from high-burden countries is highlighted.

PMID:39079037 | DOI:10.1097/INF.0000000000004409