Optimizing Data Extraction: Harnessing RAG and LLMs for German Medical Documents
Optimizing Data Extraction: Harnessing RAG and LLMs for German Medical Documents

Optimizing Data Extraction: Harnessing RAG and LLMs for German Medical Documents

Stud Health Technol Inform. 2024 Aug 22;316:949-950. doi: 10.3233/SHTI240567.

ABSTRACT

In the field of medical data analysis, converting unstructured text documents into a structured format suitable for further use is a significant challenge. This study introduces an automated local deployed data privacy secure pipeline that uses open-source Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) architecture to convert medical German language documents with sensitive health-related information into a structured format. Testing on a proprietary dataset of 800 unstructured original medical reports demonstrated an accuracy of up to 90% in data extraction of the pipeline compared to data extracted manually by physicians and medical students. This highlights the pipeline’s potential as a valuable tool for efficiently extracting relevant data from unstructured sources.

PMID:39176948 | DOI:10.3233/SHTI240567