J Affect Disord. 2024 Oct 21:S0165-0327(24)01766-X. doi: 10.1016/j.jad.2024.10.070. Online ahead of print.
ABSTRACT
Imputation methods for missing data may not always be applicable, namely, when the data were completely missing for the whole sample. To estimate the missing data, we compared three missing item substitution methods: (1) mean substitution; (2) last observation carried forward (LOCF); and (3) regression-predicted values. A total of 384 parents reported their 8- to 18-year-old children’s anxiety level using the 9-item Screen for Child Anxiety Related Disorders at baseline (Time 1) and two later time points, drawing from a larger longitudinal study (Ontario COVID-19 and Kids’ Mental Health Study). We predicted a survey item measured one month after baseline (Time 2) using: (1) the mean value of the rest of the test items; (2) the value of the same item measured at baseline; and (3) the predicted value from the linear regression with all other test items as predictors. Within-Subjects ANOVA results showed a main effect of substitution methods on total score at Time 2. Post-hoc analysis indicated that mean substitution was significantly different from the actual data. Regression-predicted values overestimated the median compared to the actual values, while the LOCF estimation produced comparable means and identical medians. Similar results were found while using other indicators and extending the analysis to a larger 4-month time interval (Time 3), suggesting LOCF is more accurate and reliable than mean substitution or regression-prediction. This study proposes when advanced substitution methods are not applicable, a systematic comparison of alternative methods may help researchers to arrive at a more informed decision in data processing.
PMID:39442699 | DOI:10.1016/j.jad.2024.10.070