J Am Med Inform Assoc. 2025 Nov 19:ocaf198. doi: 10.1093/jamia/ocaf198. Online ahead of print.
ABSTRACT
OBJECTIVE: We propose Heterogeneity-aware Collaborative One-shot Lossless Algorithm for Generalized Linear Model (COLA-GLM-H), a novel one-shot lossless distributed algorithm that enables the integration of heterogeneous multi-institutional data while preserving patient privacy by avoiding patient-level data sharing.
MATERIALS AND METHODS: Generalized Linear Models (GLMs) are widely used in medical research for analyzing diverse outcome types. In multi-institution settings, we demonstrated that the global likelihood can be reconstructed using only institution-level summary statistics, enabling lossless estimation without accessing individual records. We validated COLA-GLM-H in two real-world studies: (1) a U.S. pediatric centralized network (719,383 patients) evaluating long-term cardiovascular risks following COVID-19, and (2) an internationally decentralized network of 120,429 hospitalized patients from seven databases across three countries assessing risk factors for COVID-19 mortality.
RESULTS: In the centralized network, COLA-GLM-H produced estimates identical to those from pooled analyses. In the decentralized setting, the algorithm effectively integrated heterogeneous data across multiple clinical institutions using a single communication round.
CONCLUSIONS: COLA-GLM-H is a privacy-preserving, lossless, and communication- and computation-efficient solution for multi-institutional research. It accounts for between-institution heterogeneity and supports all outcome types within the exponential family, enabling secure, scalable, and accurate analysis in collaborative clinical research.
PMID:41259033 | DOI:10.1093/jamia/ocaf198