Bioinformatics. 2024 Dec 2:btae726. doi: 10.1093/bioinformatics/btae726. Online ahead of print.
ABSTRACT
SUMMARY: Extensive human health data from cohort studies, national registries, and biobanks can reveal lifecourse risk factors impacting health. Combining these sources offers increased statistical power, rare outcome detection, replication of findings, and extended study periods. Traditionally, this required data transfer to a central location or separate partner analyses with pooled summary statistics, posing ethical, legal, and time constraints. Federated analysis-which involves remote data analysis without sharing individual-level data-is a promising alternative. One promising solution is DataSHIELD (https://datashield.org/), an open-source R based implementation. To enable federated analysis, data owners need a user-friendly way to install the federated infrastructure and manage users and data. Here we present MOLGENIS Armadillo: a lightweight server for federated analysis solutions such as DataSHIELD.
AVAILABILITY AND IMPLEMENTATION: Armadillo is implemented as a collection of three packages freely available under the open source licence LGPLv3: two R packages downloadable from the Comprehensive R Archive Network (CRAN) (“MolgenisArmadillo” and” DSMolgenisArmdillo”) and one Java application (“ArmadilloService”) as jar and docker images via Github (https://github.com/molgenis/molgenis-service-armadillo).
SUPPLEMENTARY MATERIALS: In Supplementary material we include screenshots of the user interface (UI) to illustrate the use of Armadillo.
PMID:39673440 | DOI:10.1093/bioinformatics/btae726