JCO Clin Cancer Inform. 2025 May;9:e2400223. doi: 10.1200/CCI-24-00223. Epub 2025 May 2.
ABSTRACT
PURPOSE: To reduce costs in genomic studies of time-to-event phenotypes like survival, researchers often sequence a subset of samples from a larger cohort. This process usually involves two phases: first, collecting inexpensive variables from all samples, and second, selecting a subset for expensive measurements, for example, sequencing-based biomarkers. Common two-phase designs include nested case-control and case-cohort designs. Additional designs include sampling subjects based on follow-up time, like extreme case-control designs. Recently an optimal two-phase design using a maximum likelihood-based method was proposed, which could accommodate arbitrary sample selection in the second phase. However, direct comparisons of this optimal design with others in terms of power and computational cost is lacking.
METHODS: This study performs a direct evaluation of typical two-phase designs, including Tao’s optimal design, on type I error, power, effect size estimation, and computational time, using both simulated and real data sets.
RESULTS: Results show that the optimal design had the highest power and accurate effect size estimation under the Cox regression model. Surprisingly, logistic regression achieved similar power with much lower computational cost than a more sophisticated method. The study further applied these methods to the MP2PRT study, reporting hazard ratios of cancer subtypes on relapse risk.
CONCLUSION: Recommendations for selecting two-phase designs and analysis methods are regarding power, bias of estimated effect size, and computational time.
PMID:40315406 | DOI:10.1200/CCI-24-00223