Lung ultrasound among Expert operator’S: ScOring and iNter-rater reliability analysis (LESSON study) a secondary COWS study analysis from ITALUS group
Lung ultrasound among Expert operator’S: ScOring and iNter-rater reliability analysis (LESSON study) a secondary COWS study analysis from ITALUS group

Lung ultrasound among Expert operator’S: ScOring and iNter-rater reliability analysis (LESSON study) a secondary COWS study analysis from ITALUS group

J Anesth Analg Crit Care. 2024 Jul 31;4(1):50. doi: 10.1186/s44158-024-00187-x.

ABSTRACT

BACKGROUND: Lung ultrasonography (LUS) is a non-invasive imaging method used to diagnose and monitor conditions such as pulmonary edema, pneumonia, and pneumothorax. It is precious where other imaging techniques like CT scan or chest X-rays are of limited access, especially in low- and middle-income countries with reduced resources. Furthermore, LUS reduces radiation exposure and its related blood cancer adverse events, which is particularly relevant in children and young subjects. The score obtained with LUS allows semi-quantification of regional loss of aeration, and it can provide a valuable and reliable assessment of the severity of most respiratory diseases. However, inter-observer reliability of the score has never been systematically assessed. This study aims to assess experienced LUS operators’ agreement on a sample of video clips showing predefined findings.

METHODS: Twenty-five anonymized video clips comprehensively depicting the different values of LUS score were shown to renowned LUS experts blinded to patients’ clinical data and the study’s aims using an online form. Clips were acquired from five different ultrasound machines. Fleiss-Cohen weighted kappa was used to evaluate experts’ agreement.

RESULTS: Over a period of 3 months, 20 experienced operators completed the assessment. Most worked in the ICU (10), ED (6), HDU (2), cardiology ward (1), or obstetric/gynecology department (1). The proportional LUS score mean was 15.3 (SD 1.6). Inter-rater agreement varied: 6 clips had full agreement, 3 had 19 out of 20 raters agreeing, and 3 had 18 agreeing, while the remaining 13 had 17 or fewer people agreeing on the assigned score. Scores 0 and score 3 were more reproducible than scores 1 and 2. Fleiss’ Kappa for overall answers was 0.87 (95% CI 0.815-0.931, p < 0.001).

CONCLUSIONS: The inter-rater agreement between experienced LUS operators is very high, although not perfect. The strong agreement and the small variance enable us to say that a 20% tolerance around a measured value of a LUS score is a reliable estimate of the patient’s true LUS score, resulting in reduced variability in score interpretation and greater confidence in its clinical use.

PMID:39085969 | DOI:10.1186/s44158-024-00187-x