Neuropsychological test validation of speech markers of cognitive impairment in the Framingham Cognitive Aging Cohort

This study validates speech-based markers for cognitive impairment detection by comparing original and expanded linguistic feature sets against neuropsychological tests. The research demonstrates that expanded linguistic features outperform traditional screening tools like MMSE in classifying cognitive status.

Published on June 30, 2021

Authors

Larry Zhang, Anthony Ngo, Jason A. Thomas, Hannah A. Burkhardt, Carolyn M. Parsey, Rhoda Au, Reza Hosseini Ghomi

Read the full paper

Abstract

Although clinicians primarily diagnose dementia based on a combination of metrics such as medical history and formal neuropsychological tests, recent work using linguistic analysis of narrative speech to identify dementia has shown promising results. We aim to build upon research by demonstrating the predictive capability of linguistic analysis in differentiating cognitively normal from cognitively impaired participants and comparing the performance of the original linguistic features with the performance of expanded features. Data were derived from a subset of the FHS Cognitive Aging Cohort. We analyzed a sub-selection of 98 participants, which provided 127 unique audio files and clinical observations. We built on previous work which extracted original linguistic features from transcribed audio files by extracting expanded features including syntactic, semantic, and lexical information.

Key Findings

Based on the area under the receiver-operator characteristic curve (AUC) of the models, both the original (AUC = 0.882) and expanded (AUC = 0.883) feature sets outperformed MMSE (AUC = 0.870) in classifying cognitively impaired and cognitively normal participants. The expanded feature set showed better positive predictive value (PPV = 0.738) and negative predictive value (NPV = 0.889) compared to the original feature set (PPV = 0.701, NPV = 0.869). The expanded linguistic feature set demonstrated stronger correlations with language-based neuropsychological tests, particularly Logical Memory tasks, showing better specificity in characterizing cognitive deficits.

Methodology

The researchers used a subset of 141 unique participants from the FHS Cognitive Aging Cohort, analyzing 127 observations from 98 participants after excluding those with missing data. They extracted expanded linguistic features including syntactic features (sentence complexity, parse tree depth), semantic features (using LIWC 2015 categorization), and lexical features (vocabulary breadth measures). The study employed logistic regression with Lasso and ridge regularization using leave-one-out cross-validation to predict binary cognitive impairment status. Features were normalized and class imbalance was handled using weighted cross-entropy loss.

Impact

This research advances the field of digital biomarkers for cognitive assessment by demonstrating that expanded linguistic analysis can provide clinical-grade performance in cognitive screening. The work suggests potential for incorporating speech recordings into general practice settings, offering earlier detection capabilities and reducing barriers to cognitive assessment. The findings indicate that decomposing language into syntactic, semantic, and lexical components provides better representation of linguistic behavior in cognitive decline, paving the way for more sophisticated automated cognitive assessment tools.

© 2025 Larry Zhang