In vivo cell-type and brain region classification via multimodal contrastive learning

  • 1Columbia University
  • 2Northwestern University
  • 3Boston University
  • 4University of College London

  • 5Allen Institute
  • 6Champalimaud Foundation
  • 7Georgia Institute of Technology
  • 8University of Washington

Overview

overview

Abstract

Current electrophysiological approaches can track the activity of many neurons, yet it is usually unknown which cell-types or brain areas are being recorded without further molecular or histological analysis. Developing accurate and scalable algorithms for identifying the cell-type and brain region of recorded neurons is thus crucial for improving our understanding of neural computation. In this work, we develop a multimodal contrastive learning approach for neural data that can be fine-tuned for different downstream tasks, including inference of cell-type and brain location. We utilize multimodal contrastive learning to jointly embed the activity autocorrelations and extracellular waveforms of individual neurons. We demonstrate that our embedding approach, Neuronal Embeddings via MultimOdal Contrastive Learning (NEMO), paired with supervised fine-tuning, achieves state-of-the-art cell-type classification for two opto-tagged datasets and brain region classification for the public International Brain Laboratory Brain-wide Map dataset. Our method represents a promising step towards accurate cell-type and brain region classification from electrophysiological recordings. Code is availabe at https://github.com/Haansololfp/NEMO.

Highlights

  • A multimodal contrastive learning method for electrophysiological data, Neuronal Embeddings via MultimOdal Contrastive Learning (NEMO).
  • Utilizes unlabeled data for pre-training and can be fine-tuned for different downstream tasks including cell-type and brain region classification.
  • NEMO outperforms current unsupervised (PhysMAP and VAEs) and supervised methods, with particularly strong performance in label-limited regimes.

Model schematic

method

NEMO utilizes a CLIP-based objective where an EAP encoder and an ACG image encoder are trained to embed randomly augmented EAPs and ACG image from the same neuron close together while keeping different neurons separate. The learned representations can be utilized for downstream tasks such as cell-type and brain-region classification.

Cerebellum cell-type classification (Beau et al. 2025)

cell-type ultra

Visual cortex cell-type classification (Ye et al. 2024)

cell-type c4

Comparing NEMO to baseline models on two different optotagged datasets: an NP Ultra visual cortex (Ye et al. 2024) and a Neuropixels 1 cerebellum dataset (Beau et al. 2025). We show the UMAP visualization of NEMO representations for unseen opto-tagged units, colored by different cell-types. We also show the Balanced accuracy Confusion matrices normalized by ground truth label and averaged across 5 random seeds. NEMO outperforms the other embedding methods by a significant margin across all cell-types and evaluation methods.

Brain region classification for the Brain-Wide Map (BWM; International Brain Laboratory et al. 2024)

region classification

Results for NEMO on the IBL brain region classification task. We show a schematic for multi-neuron classifier. At each depth, the neurons within 60 microns were used to classify the anatomical region. Only the nearest 5 neurons were selected if there were more than 5 neurons within that range. We average the logits of the single-neuron classifier (trained on NEMO embeddings) for all 5 neurons. The final prediction is based on the average of the individual logits. We show confusion matrices for the single-neuron region classifier and multi-neuron region classifier for fine-tuned NEMO, averaged across 5 runs. We show the single neuron balanced accuracy with linear classifier and the MLP head for each model trained/fine-tuned with different label ratios. We also show the single-neuron MLP-classification balanced accuracy for each modality separately and for the combined representation.

Clustering and visualization the NEMO representations for the BWM

region clustering

IBL neuron clustering using NEMO. We show the UMAP visualization of the representations that NEMO extracts from the training data colored by anatomical brain region. We also show the same UMAP instead colored by cluster labels using a graph-based approach (Louvain clustering). We tuned the neighborhood size in UMAP and the resolution for the clustering. These parameters were selected by maximizing the modularity index which minimized the number of clusters. We show 2D brain slices across three brain views with the location of individual neurons colored using the cluster IDs. The black lines show the region boundaries of the Allen mouse atlas (Wang et al. 2020). The cluster distribution found using NEMO is closely correlated with the anatomical regions and is consistent across insertions from different labs.

BibTeX

If you find our data or project useful in your research, please cite:
 @inproceedings{
    yu2025in,
    title={In vivo cell-type and brain region classification via multimodal contrastive learning},
    author={Han Yu and Hanrui Lyu and YiXun Xu and Charlie Windolf and Eric Kenji Lee and Fan Yang and Andrew M Shelton and Olivier Winter and International Brain Laboratory and Eva L Dyer and Chandramouli Chandrasekaran and Nicholas A. Steinmetz and Liam Paninski and Cole Lincoln Hurwitz},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=10JOlFIPjt}
}

Acknowledgments

We thank Jonathan Pillow and Tatiana Engel for providing feedback on this manuscript. We also thank Maxime Beau and the other authors of Beau et al. 2025 for sharing the C4 cerebellum dataset. This project was supported by the Wellcome Trust (PRF 209558, 216324, 201225, and 224688 to MH, SHWF 221674 to LFR, collaborative award 204915 to MC, MH and TDH), National Institutes of Health (1U19NS123716), the Simons Foundation, the DoD OUSD (R\&E) under Cooperative Agreement PHY-2229929 (The NSF AI Institute for Artificial and Natural Intelligence), the Kavli Foundation, the Gatsby Charitable Foundation (GAT3708), the NIH BRAIN Initiative (U01NS113252 to NAS, SRO, and TDH), the Pew Biomedical Scholars Program (NAS), the Max Planck Society (GL), the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 834446 to GL and AdG 695709 to MH), the Giovanni Armenise Harvard Foundation (CDA to LFR), the Human Technopole (ECF to LFR), the NSF (IOS 211500 to NBS), the Klingenstein-Simons Fellowship in Neuroscience (NAS), the NINDS R01NS122969, the NINDS R21NS135361, the NINDS F31NS131018, the NSF CAREER awards IIS-2146072, as well as generous gifts from the McKnight Foundation, and the CIFAR Azrieli Global Scholars Program. GM is supported by a Boehringer Ingelheim Fonds PhD Fellowship. The primate research procedures were supported by the NIH P51 (OD010425) to the WaNPRC, and animal breeding was supported by NIH U42 (OD011123). Computational modeling work was supported by the European Union Horizon 2020 Research and Innovation Programme under Grant Agreement No. 945539 Human Brain Project SGA3 and No. 101147319 EBRAINS 2.0 (GTE and TVN). Computational resources for building machine learning models were provided by ACCESS, which is funded by the US National Science Foundation.