Automated voice biomarkers for depression symptoms using an online cross‐sectional data collection initiative

This study explores and validates voice features extracted from recorded speech samples as digital biomarkers for depression symptoms including suicidality, psychomotor disturbance, and depression severity. Voice features from depressed subjects successfully predicted PHQ9 scores with an area under the curve of 0.821 and mean absolute error of 4.7.

Published on May 7, 2020

Authors

Larry Zhang, Radhika Duvvuri, Kiranmayi K. L. Chandra, Theresa Nguyen, Reza H. Ghomi

Read the full paper

Abstract

Importance: Depression is an illness affecting a large percentage of the world’s population throughout the lifetime. To date, there is no available biomarker for depression detection and tracking of symptoms relies on patient self‐report.

Objective: To explore and validate features extracted from recorded voice samples of depressed subjects as digital biomarkers for suicidality, psychomotor disturbance, and depression severity.

Design: We conducted a cross‐sectional study over the course of 12 months using a frequently visited web form version of the PHQ9 hosted by Mental Health America (MHA) to ask subjects for anonymous voice samples via a separate web form hosted by NeuroLex Laboratories. Subjects were asked to provide demographics, answers to the PHQ9, and two voice samples.

Setting: Online only.

Participants: Users of the MHA website.

Main Outcomes and Measures: Performance of statistical models using extracted voice features to predict psychomotor disturbance, suicidality, and depression severity as indicated by the PHQ9.

Results: Voice features extracted from recorded audio of depressed subjects were able to predict PHQ9 question 9 and total scores with an area under the curve of 0.821 and a mean absolute error of 4.7, respectively. Psychomotor Disturbance prediction was less powerful with an area under the curve of 0.61.

Conclusion and Relevance: Automated voice analysis using short recordings of patient speech may be used to augment depression screen and symptom management.

Key Findings

Methodology

The researchers conducted a 12-month cross-sectional study in partnership with Mental Health America, leveraging their high-traffic depression screening website to collect anonymous voice samples. The study used an innovative crowdsourced approach where users completing the PHQ9 depression questionnaire were invited to donate voice samples through a separate NeuroLex Laboratories web application.

Data Collection Protocol: Participants provided two voice samples: (1) reading the phrase “The quick brown fox jumps over the lazy dog” to capture phonemes and letters, and (2) giving a 30-second free speech sample for richer linguistic content. Data collection was limited to these tasks to minimize participant dropout while maintaining anonymity.

Voice Feature Extraction: Implemented three comprehensive feature extraction approaches: (1) Acoustic features using the Extended Geneva Minimalistic Acoustic Parameterization Set (eGeMAPS) with 88 features including F0, harmonics, loudness, and spectral characteristics, (2) Prosodic features measuring speech timing, pause patterns, and rhythm using custom webRTC-based analysis, and (3) Linguistic features through manual transcription and vectorization using count vectorization and N-gram TF-IDF methods.

Data Preprocessing and Validation: Applied rigorous quality control including minimum file size requirements (353 KB), voice activity detection to remove unvoiced samples, and noise reduction using second-order Butterworth band-pass filtering (300 Hz - 3.4 kHz). The final dataset included 390 valid audio files from 222 unique participants.

Statistical Modeling: Used gradient-boosted tree models with ElasticNet regularization for binary classification of symptoms and regression for depression severity prediction. Employed five-fold cross-validation with SMOTE up-sampling to address class imbalances, and evaluated performance using area under the curve (AUC) for classification and mean absolute error (MAE) for regression tasks.

Impact

This research represents a pioneering approach to depression assessment through digital biomarkers, with significant implications for mental health screening, monitoring, and accessibility. The study demonstrates the feasibility of using brief voice samples for objective depression assessment, potentially addressing limitations of traditional self-report measures.

Clinical Assessment Innovation: The ability to predict suicidality with AUC 0.821 using just 30 seconds of speech represents a breakthrough for suicide risk assessment, offering a potentially more objective and accessible screening method than traditional questionnaires. This could enable earlier intervention and more frequent monitoring of at-risk individuals.

Scalable Mental Health Screening: The crowdsourced data collection approach demonstrates potential for large-scale, cost-effective depression screening through existing web platforms. With Mental Health America’s website receiving tens of thousands of PHQ9 completions monthly, this method could facilitate population-level mental health surveillance.

Digital Therapeutics Foundation: The identification of specific voice biomarkers for depression symptoms establishes groundwork for digital therapeutic applications, including smartphone-based monitoring tools and voice-enabled mental health applications that could provide continuous, unobtrusive assessment of symptom changes.

Research Methodology Advancement: The study’s online-only, anonymous data collection methodology offers a model for mental health research that overcomes traditional barriers including geographical limitations, stigma concerns, and recruitment challenges, potentially enabling more diverse and representative research populations.

Future Clinical Integration: The comparable performance to clinical interview-based assessments (DAIC-WOZ) suggests potential for integrating voice biomarkers into clinical workflows, electronic health records, and telemedicine platforms to augment traditional depression screening and monitoring approaches.

© 2025 Larry Zhang