Anal Chem. 2026 Apr 20. doi: 10.1021/acs.analchem.6c00429. Online ahead of print.
ABSTRACT
Rapid and precise discrimination of Escherichia coli strains─including both pathogenic and nonpathogenic variants─is essential for clinical diagnosis and food safety monitoring. Raman spectroscopy provides nondestructive molecular fingerprint information; however, its practical implementation is hindered by two principal challenges: the high spectral similarity among different strains and the scarcity of high-quality labeled data for training deep learning models. To address these limitations, we present an integrated deep learning framework that combines a Wasserstein generative adversarial network (WGAN) with a Transformer classifier (WGAN-Transformer). The WGAN first learns the underlying distribution of limited original Raman spectra and generates physically meaningful synthetic spectra, thereby effectively augmenting and balancing the training data set. The augmented data are then fed into a tailored Transformer classifier, which leverages a self-attention mechanism to extract subtle discriminative features from both global and local spectral characteristics. Under small-sample conditions (≈300 original spectra per strain), the framework increased the overall classification accuracy for eight representative Escherichia coli strains from approximately 74% (base Transformer) to over 97% in 5-fold cross-validation. Independent test-set validation confirmed strong generalization, with accuracy maintained above 94%. This work not only enables high-accuracy, nondestructive identification of closely related bacterial strains but also establishes a general “generative augmentation-discriminative deep analysis” paradigm, offering a robust methodological strategy for small-sample, high-complexity classification tasks in biospectroscopy and related analytical fields.
PMID:42007758 | DOI:10.1021/acs.analchem.6c00429