[English](/CASSIA/) | [中文](/CASSIA/README_CN.html)
CASSIA (Collaborative Agent System for Single-cell Interpretable Annotation) is a tool that enhances cell type annotation using multi-agent Large Language Models (LLMs).
🌐 CASSIA Web UI (cassia.bio) - A web interface supporting most of CASSIA’s functionality
📚 Complete Documentation/Vignette (docs.cassia.bio)
🤖 LLMs Annotation Benchmark (sc-llm-benchmark.com)
2025-11-29 🎇 Major update with new features and improvements!
- Python Documentation: Complete Python docs and vignettes now available
- Annotation Boost Improvements: Sidebar navigation, better reports, bug fixes
- Better Scanpy Support: Fixed marker processing, improved R/Python sync
- Symphony Compare Update: Improved comparison module
- Batch Output & Ranking: Updated HTML output for runCASSIA_batch with new ranking method option
- Fuzzy Model Aliases: Easier model selection without remembering exact names
# Install dependencies
install.packages("devtools")
install.packages("reticulate")
# Install CASSIA
devtools::install_github("ElliotXie/CASSIA/CASSIA_R")
Note: If the environment is not set up correctly the first time, please restart R and run the code below
library(CASSIA)
setup_cassia_env()
It should take about 3 minutes to get your API key.
We recommend starting with OpenRouter since it provides access to most models through a single API key.
# For OpenRouter
setLLMApiKey("your_openrouter_api_key", provider = "openrouter", persist = TRUE)
# For OpenAI
setLLMApiKey("your_openai_api_key", provider = "openai", persist = TRUE)
# For Anthropic
setLLMApiKey("your_anthropic_api_key", provider = "anthropic", persist = TRUE)
CASSIA includes example marker data in two formats:
# Load example data
markers_unprocessed <- loadExampleMarkers(processed = FALSE) # Direct Seurat output
markers_processed <- loadExampleMarkers(processed = TRUE) # Processed format
# The default provider is set to OpenRouter.
runCASSIA_pipeline(
output_file_name = "cassia_test", # Base name for output files
tissue = "Large Intestine", # Tissue type (e.g., "brain")
species = "Human", # Species (e.g., "human")
marker = markers_unprocessed, # Marker data from findallmarker
max_workers = 4 # Number of parallel workers
)
You can choose any model for annotation and scoring. Some classic models are listed below. OpenRouter supports most of the current popular models, although some have not been extensively benchmarked in the CASSIA paper — feel free to experiment with them.
gpt-5.1: Balanced option (Recommended)gpt-4o: Used in the benchmarkgoogle/gemini-2.5-flash: One of the best-performed low-cost models, comparable with models like gpt-4o (Recommended)deepseek/deepseek-chat-v3-0324: One of the best-performed open-source models, which gives very detailed annotationsx-ai/grok-4-fast One of the best-performed low-cost models.claude-sonnet-4-5: The latest best-performed model (Most recommended)The pipeline generates four key files:
# Check if API key is set correctly
key <- Sys.getenv("ANTHROPIC_API_KEY")
print(key) # Should not be empty
# Reset API key if needed
setLLMApiKey("your_api_key", provider = "anthropic", persist = TRUE)
Note: This README covers only basic CASSIA functionality. For a complete tutorial including advanced features and detailed examples, please visit: CASSIA Complete Tutorial.
📖 Read our preprint (v2, latest)
📖 Original preprint (v1, historical)
CASSIA: a multi-agent large language model for reference-free, interpretable, and automated cell annotation of single-cell RNA-sequencing data
Elliot Xie, Lingxin Cheng, Jack Shireman, Yujia Cai, Jihua Liu, Chitrasen Mohanty, Mahua Dey, Christina Kendziorski
bioRxiv 2024.12.04.626476; doi: https://doi.org/10.1101/2024.12.04.626476
If you have any questions or need help, feel free to email us. We are always happy to help: xie227@wisc.edu If you find this project helpful, please share it with your friend, and give this repo a star ⭐ Many thanks!