Random forest algorithm for predicting tobacco use and identifying determinants among pregnant women in 26 sub-Saharan African countries: a 2024 analysis
Random forest algorithm for predicting tobacco use and identifying determinants among pregnant women in 26 sub-Saharan African countries: a 2024 analysis

Random forest algorithm for predicting tobacco use and identifying determinants among pregnant women in 26 sub-Saharan African countries: a 2024 analysis

BMC Public Health. 2025 Apr 23;25(1):1506. doi: 10.1186/s12889-025-22794-1.

ABSTRACT

INTRODUCTION: Tobacco use during pregnancy is a significant public health concern, associated with adverse maternal and neonatal outcomes. Despite its critical importance, comprehensive data on tobacco use among pregnant women in sub-Saharan Africa is limited. Leveraging machine learning approaches allows us to better understand these constraints and predict tobacco use among pregnant women, providing actionable insights for policy and intervention.

OBJECTIVE: This study aimed to predict tobacco use and identify its determinants among pregnant women in 26 SSA countries using machine learning algorithm.

METHODS: Using data from the Demographic and Health Surveys (2016-2023) across 26 SSA countries, we analyzed responses from 33,705 pregnant women. The Random Forest classifier, complemented by SHAP for feature interpretability, was employed for prediction and analysis. Data preprocessing included K-nearest neighbor imputation for missing values, SMOTE for handling class imbalance, and Recursive Feature Elimination for feature selection. Model performance was evaluated using metrics such as accuracy, recall, F1 score, and AUC-ROC.

RESULTS: The Random Forest model demonstrated robust performance, achieving an AUC-ROC of 98%, recall of 94%, and F1 score of 93%. Key predictors identified included maternal literacy, maternal education, wealth index, distance to healthcare facilities, and place of residence. Pregnant women with lower educational attainment, residing in rural areas, and from lower wealth quintiles were more likely to use tobacco.

CONCLUSION AND RECOMMENDATIONS: This study utilized a Random Forest machine learning algorithm to identify key predictors of tobacco use among pregnant women across 26 Sub-Saharan African countries. Significant factors included maternal literacy, education, wealth index, and healthcare access, highlighting systemic inequities contributing to tobacco dependency during pregnancy. These findings advocate for policies addressing educational disparities, economic inequalities, and barriers to healthcare access to reduce tobacco use and improve maternal and neonatal outcomes. Future research should incorporate longitudinal data to enhance predictive accuracy and inform policy development.

PMID:40269837 | DOI:10.1186/s12889-025-22794-1