Adm Policy Ment Health. 2025 Jul 14. doi: 10.1007/s10488-025-01454-x. Online ahead of print.
ABSTRACT
To develop screening guidelines, the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) Evidence to Decision (EtD) framework recommends careful assessment of both test accuracy and the downstream consequences of screening. To tailor recommendations to a specific context, GRADE EtD recommends ensuring that all assumptions and inputs on which the original recommendations are based are appropriate to the novel setting. Perinatal depression screening offers a notable example where evidence-based screening guidelines are recommended at a national level, yet implementation necessarily occurs in specific contexts. Methods to examine the generalizability of assumptions underlying screening recommendations are needed. The GRADE EtD framework demonstrates how local prevalence can be combined with evidence on screening sensitivity and specificity to estimate the number of true positive, false positive, true negative, and false negative results. In turn, these estimates can be linked to evidence of benefit and harm, such as potential benefits from treatment or stigma from false positive identification. To estimate benefit at a local level, we developed a simulation model that expresses prevalence as a function of sensitivity, specificity, and the proportion of patients who screen positive. We then identified published systematic reviews and meta-analyses of (a) perinatal depression prevalence, (b) screening accuracy, (c) implementation of screening in clinical settings. We then used a participatory form of simulation modeling to estimate prevalence at a local level-a necessary first step to evaluation net benefit-and to explore alternative hypotheses through sensitivity analyses. We identified meta-analyses of prevalence and screening accuracy, as well as 14 screening studies with data sufficient to inform key questions. Simulation models estimated local prevalence as a function of positive screening rates and published estimates of sensitivity and specificity. These prevalence estimates displayed marked heterogeneity, including frequent implausible impossible values (e.g., prevalence < 0%). Findings suggest that screening data are insufficient to estimate local prevalence and that sensitivity and specificity are not stable properties of screening questionnaires. Instead, study-level differences in context may be influential, such as variation in patients’ willingness to disclose depression symptoms across settings. Results highlight the opportunity for simulation modeling to inform evidence synthesis and decision-making.
PMID:40658348 | DOI:10.1007/s10488-025-01454-x