An open-source platform for reproducible machine learning analysis of AIRR data — covering classification, clustering, generative modeling, simulation, and visualization.
Adaptive immune receptor repertoires (AIRRs) are the full set of T or B-cell receptor sequences in an individual — a molecular record of past and ongoing immune responses to pathogens, vaccines, and disease. Understanding the patterns encoded in AIRR data opens opportunities for diagnostic, prognostic, and therapeutic applications.
immuneML handles the infrastructure — data import, preprocessing, subsampling, encoding, hyperparameter optimization, and reporting — so that ML researchers can focus on developing new methods, and immunologists on biological questions. Analyses are defined in shareable YAML specification files, ensuring full reproducibility.
The latest immuneML release extends the platform to cover unsupervised ML — clustering, generative modeling, protein language model embeddings, and dimensionality reduction — alongside the existing supervised classification workflows.
immuneML provides components for the full ML analysis lifecycle, from data import to result reporting, in a unified and extensible framework.
Available as a Python package, Docker image, and conda package. Full documentation with tutorials, YAML examples, and developer guides at docs.immuneml.uio.no.
Every immuneML analysis is defined in a human-readable YAML specification file. Running it produces a structured HTML report along with exported models, raw data, and all intermediate results.
Three use cases from the 2026 preprint demonstrate clustering, generative modeling, and exploratory analysis workflows on both synthetic and experimental AIRR datasets.
If you use immuneML in your research, please cite the relevant paper(s) below.
Questions, bug reports, or contributions — we're happy to hear from you.