Beyond labels: determining the true type of blood gas samples in ICU patients through supervised machine learning
Beyond labels: determining the true type of blood gas samples in ICU patients through supervised machine learning

Beyond labels: determining the true type of blood gas samples in ICU patients through supervised machine learning

BMC Med Inform Decis Mak. 2025 Jul 24;25(1):275. doi: 10.1186/s12911-025-03115-3.

ABSTRACT

BACKGROUND: In the Intensive Care Unit (ICU), data stored in patient data management systems (PDMS) is commonly used in clinical practice and research. Parameters from point-of-care arterial blood gas (BG) analysis are used in the diagnosis and definition of syndromes such as sepsis and ARDS, but manual entry of the blood source (arterial or venous) into the PDMS introduces the risk of mislabeling venous samples as arterial. Our study aimed to employ supervised machine learning to accurately identify blood gas samples as arterial or venous using PDMS data.

METHODS: A retrospective, single-center observational cohort study including all blood gases during 2018 from a Swedish, pediatric and adult general ICU. Chemical parameters from BG analysis and clinical parameters such as mean arterial pressure (MAP) and saturation (SpO2) were utilized as features. A specialist physician in Intensive Care manually determined the true class of each sample through comprehensive retrospective chart review. The samples were split into training, testing and holdout sets. Training was performed using cross-validation in the training set, with forward stepwise feature selection and Bayesian hyperparameter optimization, and accuracy was assessed using area under the precision recall curve (AUCPR) in the test set. The best model was compared to a multivariate logistic regression model (LR) in the holdout set.

RESULTS: Among 33,800 samples (30,753 arterial, 3,047 non-arterial) from 691 ICU admissions, 150 (0.44%) were erroneously marked. The best performing algorithm was extreme gradient boosting (XGboost) using 9 features, with an AUCPR of 0.9974 (95% CI 0.9961-0.9984), significantly better than the LR model (AUCPR = 0.9791, 95% CI 0.9651-0.9904).

CONCLUSION: Supervised machine learning demonstrates efficacy in determining blood gas sample type from ICU patients. This approach shows promise for improving the accuracy of research and clinical applications relying on blood gas data.

PMID:40707901 | DOI:10.1186/s12911-025-03115-3