forager: A Python package and web interface for modeling mental search

Authors

Abhilasha A. Kumar, Molly Apsel, Larry Zhang, Nancy Xing, and Michael N. Jones

Abstract

Analyzing data from the verbal fluency task (e.g., “name all the animals you can in a minute”) is of interest to both memory researchers and clinicians due to its broader implications for memory search and retrieval. Recent work has proposed several computational models to examine nuanced differences in search behavior, which can provide insights into the mechanisms underlying memory search. A prominent account of memory search within the fluency task was proposed by Hills, Jones, and Todd (2012), where mental search is modeled after how animals forage for food in physical space. Despite the broad potential utility of these models to scientists and clinicians, there is currently no open-source program to apply and compare existing foraging models or clustering algorithms without extensive, often redundant programming. To remove this barrier to studying search patterns in the fluency task, we created forager, a Python package (https://github.com/thelexiconlab/forager) and web interface (https://forager.research.bowdoin.edu/). forager provides multiple automated methods to designate clusters and switches within a fluency list, implements a novel set of computational models that can examine the influence of multiple lexical sources (semantic, phonological, and frequency) on memory search using semantic embeddings, and also enables researchers to evaluate relative model performance at the individual and group level. The package and web interface cater to users with various levels of programming experience.

Key Findings

Tool Development: Created the first comprehensive open-source toolkit for analyzing verbal fluency data using computational foraging models, addressing a significant gap in accessible research tools for memory search analysis.
Multiple Clustering Methods: Implemented five distinct cluster-switch designation methods including norms-based associative and categorical search, similarity drop, delta similarity, and a novel multimodal similarity drop method that incorporates both semantic and phonological information.
Foraging Model Implementation: Developed computational models based on optimal foraging theory that examine how individuals use semantic, phonological, and frequency-based cues during memory search, extending beyond traditional static models to include dynamic and phonology-based variants.
Clinical Validation: Demonstrated the tool’s utility with clinical populations by replicating findings from psychosis research, showing that individuals with schizophrenia display different search patterns including greater reliance on frequency cues and reduced semantic clustering compared to healthy controls.
Accessibility Features: Created both command-line Python package and user-friendly web interface to accommodate users with varying programming expertise, with comprehensive documentation and example datasets.

Methodology

The researchers developed forager using a multi-component architecture consisting of data processing, switch detection methods, computational modeling, and output generation systems. The package processes verbal fluency lists through several key stages:

Data Input and Preprocessing: Users upload fluency lists in text/CSV format with participant identifiers and word sequences. The system includes robust handling of out-of-vocabulary items with three policy options (exclude, truncate, or replace with mean vectors) and automatic spell-checking using Levenshtein distance algorithms.

Lexical Data Infrastructure: Built comprehensive lexical databases including 512-dimensional semantic embeddings from Universal Sentence Encoder, word frequency data from Google Books N-gram corpus, and similarity matrices for both semantic (cosine similarity) and phonological (normalized edit distance) relationships among 2,045 animal category items.

Switch Detection Algorithms: Implemented five cluster-switch designation methods: (1) norms-based associative search using hand-coded animal subcategories, (2) norms-based categorical search requiring shared categories across cluster items, (3) similarity drop detection based on semantic similarity patterns, (4) delta similarity with configurable thresholds for rise/fall parameters, and (5) multimodal similarity incorporating weighted semantic and phonological information.

Computational Modeling: Developed six foraging models including static models using semantic similarity and frequency, dynamic models with different cues for local (within-cluster) and global (between-cluster) transitions, and novel phonology-based variants that incorporate phonological similarity as an additional search cue.

Validation and Demonstration: Validated the tool using existing datasets from Hills et al. (2012) with 141 participants and clinical data from Lundin et al. (2020) with 86 individuals including schizophrenia, schizotypal, and healthy control groups.

Impact

forager represents a significant advancement in computational tools for memory research with broad implications for both scientific and clinical communities. The package democratizes access to sophisticated foraging models that were previously limited to experts with extensive programming skills, potentially accelerating research in semantic memory and cognitive search processes.

Research Impact: The tool enables systematic comparison of different clustering methods and computational models, addressing long-standing issues with replication and standardization in verbal fluency research. By providing standardized implementations of multiple approaches, forager facilitates meta-analyses and cross-study comparisons that were previously difficult due to methodological variations.

Clinical Applications: The package’s demonstrated utility with clinical populations, particularly in replicating findings about altered search patterns in psychosis, establishes its potential as a diagnostic and assessment tool. The ability to quantify subtle differences in memory search strategies could enhance neuropsychological evaluation and contribute to early detection of cognitive impairments in conditions such as schizophrenia, Alzheimer’s disease, and ADHD.

Open Science Advancement: By releasing both the source code and a user-friendly web interface, the authors have created a sustainable resource that promotes reproducible research and collaborative development. The comprehensive documentation, example datasets, and multiple access modalities (Python package, web interface, Google Colaboratory) ensure broad accessibility across different user communities.

Future Research Directions: The tool’s modular architecture and extensible design enable researchers to develop new clustering algorithms, incorporate additional lexical features, and adapt the framework to other semantic categories beyond animals. This flexibility positions forager as a platform for advancing theoretical understanding of memory search mechanisms and developing new computational approaches to studying semantic memory organization.