Core Installation
Install from PyPI (recommended)
Or install from source (for development or wiki data)
git clone https://github.com/leeroo-ai/kapso.git
cd kapso
# Pull Git LFS files (wiki knowledge data)
git lfs install
git lfs pull
# Create conda environment (recommended)
conda create -n kapso_conda python= 3.12
conda activate kapso_conda
# Install in development mode
pip install -e .
This repository uses Git LFS for large files in data/wikis_batch_top100/.
If you didn’t install Git LFS before cloning, run these commands to fetch the files.
Configure API keys
Create .env in project root: # Required for most operations
OPENAI_API_KEY = your-openai-api-key
# For Gemini coding agent
GOOGLE_API_KEY = your-google-api-key
# For Claude Code coding agent
ANTHROPIC_API_KEY = your-anthropic-api-key
# For Leeroopedia MCP (curated ML/AI knowledge — sign up at leeroopedia.com)
LEEROOPEDIA_API_KEY = your-leeroopedia-api-key
Coding Agent Setup
Kapso supports multiple coding agents. Install the ones you plan to use:
Aider (Default)
Claude Code
Gemini
OpenHands
Aider is installed automatically with pip install -e . No additional setup required. Uses git-centric diff-based editing. # Verify installation
aider --version
Claude Code requires Node.js and the Anthropic CLI: # Install Claude Code CLI
npm install -g @anthropic-ai/claude-code
# Verify installation
claude --version
Set ANTHROPIC_API_KEY in your .env file. Gemini uses the Google AI SDK, installed with core dependencies. Set GOOGLE_API_KEY in your .env file.
OpenHands has conflicting dependencies with aider-chat. Use a separate conda environment .
# Create separate environment
conda create -n openhands_env python= 3.12
conda activate openhands_env
# Install OpenHands
pip install openhands-ai litellm
Do NOT install OpenHands in the same environment as Kapso.
Leeroopedia MCP (Optional)
Connect Kapso to Leeroopedia — a curated knowledge base of 1000+ ML/AI frameworks. Kapso agents use it during ideation and implementation to search docs, build plans, verify code, diagnose failures, and look up hyperparameter defaults.
pip install leeroopedia-mcp
Sign up at app.leeroopedia.com for an API key ($20 free credit, no credit card required), then add to your .env:
LEEROOPEDIA_API_KEY = kpsk_your_key_here
See the Leeroopedia MCP docs for Claude Code and Cursor setup.
Benchmark Installation
MLE-Bench provides Kaggle competition problems. Prerequisites:
Git LFS (sudo apt-get install git-lfs or brew install git-lfs)
Installation: # Clone and install MLE-Bench
git clone https://github.com/openai/mle-bench.git
cd mle-bench
git lfs install
git lfs fetch --all
git lfs pull
pip install -e .
cd ..
# Install MLE-specific dependencies
pip install -r benchmarks/mle/requirements.txt
Verify: PYTHONPATH = . python -m benchmarks.mle.runner --list
ALE-Bench provides AtCoder algorithmic optimization problems. Prerequisites:
Docker
libcairo2-dev (sudo apt-get install -y libcairo2-dev)
Installation: # Clone and install ALE-Bench
git clone https://github.com/SakanaAI/ALE-Bench.git
cd ALE-Bench
pip install .
pip install ".[eval]"
# Build Docker container for evaluation
bash ./scripts/docker_build_202301.sh $( id -u ) $( id -g )
cd ..
Verify: PYTHONPATH = . python -m benchmarks.ale.runner --list
Infrastructure Setup
Kapso uses Docker containers for its knowledge graph infrastructure:
Weaviate : Vector database for semantic search
Neo4j : Graph database for relationships
MediaWiki (optional): Web UI for browsing wiki pages
Quick Start (Recommended)
# Start all infrastructure
./scripts/start_infra.sh
# Stop infrastructure (data preserved)
./scripts/stop_infra.sh
# Stop and wipe all data
./scripts/stop_infra.sh --volumes
Default Service URLs
Service URL Credentials MediaWiki http://localhost:8090 admin / adminpass123Neo4j Browser http://localhost:7474 neo4j / passwordWeaviate http://localhost:8080 Anonymous (no auth)
Manual Setup
If you prefer to start services individually:
Docker Compose (All Services)
docker compose -f services/infrastructure/docker-compose.yml up -d
This starts:
Weaviate (port 8080) - Vector database for embeddings
Neo4j (ports 7474, 7687) - Graph database for relationships
MediaWiki (port 8090) - Web UI for browsing wiki pages
MariaDB - Backend database for MediaWiki
Weaviate (vector DB): docker run -d --name weaviate \
-p 8080:8080 -p 50051:50051 \
-e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED= true \
-e PERSISTENCE_DATA_PATH='/var/lib/weaviate' \
semitechnologies/weaviate:1.27.0
Neo4j (graph DB): docker run -d --name neo4j \
--restart unless-stopped \
-p 7474:7474 -p 7687:7687 \
-e NEO4J_AUTH=neo4j/password \
neo4j:5.18.0
Add infrastructure settings to your .env:
# Neo4j connection
NEO4J_URI = bolt://localhost:7687
NEO4J_USER = neo4j
NEO4J_PASSWORD = password
# Weaviate connection
WEAVIATE_URL = http://localhost:8080
# MediaWiki (optional)
MW_BASE = http://localhost:8090
MW_USER = admin
MW_PASS = adminpass123
Knowledge Graph Indexing
After infrastructure is running, index your wiki pages:
Using Kapso API (Recommended)
from kapso.kapso import Kapso
# Initialize Kapso
kapso = Kapso( config_path = "src/config.yaml" )
# Index wiki pages (one-time operation)
kapso.index_kg(
wiki_dir = "data/wikis" ,
save_to = "data/indexes/my_knowledge.index" ,
)
# Load existing index on subsequent runs
kapso = Kapso(
config_path = "src/config.yaml" ,
kg_index = "data/indexes/my_knowledge.index" ,
)
The .index file is a JSON reference to the indexed data:
{
"version" : "1.0" ,
"created_at" : "2025-01-15T10:30:00Z" ,
"data_source" : "data/wikis_llm_finetuning" ,
"search_backend" : "kg_graph_search" ,
"backend_refs" : {
"weaviate_collection" : "KapsoWiki" ,
"embedding_model" : "text-embedding-3-large"
},
"page_count" : 99
}
KG Search Backends
Backend Data Format Storage Use Case kg_graph_searchWiki pages (.md/.mediawiki) Weaviate + Neo4j Semantic search with LLM reranking kg_llm_navigationJSON (nodes/edges) Neo4j only LLM-guided graph navigation
Environment Variables Reference
Variable Required Default Description OPENAI_API_KEYYes - OpenAI API key (also for embeddings) GOOGLE_API_KEYNo - Google API key for Gemini ANTHROPIC_API_KEYNo - Anthropic API key for Claude LEEROOPEDIA_API_KEYNo - Leeroopedia API key (sign up ) NEO4J_URINo bolt://localhost:7687Neo4j connection URI NEO4J_USERNo neo4jNeo4j username NEO4J_PASSWORDNo passwordNeo4j password WEAVIATE_URLNo http://localhost:8080Weaviate server URL MW_BASENo http://localhost:8090MediaWiki base URL MW_USERNo adminMediaWiki username MW_PASSNo adminpass123MediaWiki password CUDA_DEVICENo 0GPU device for ML training
Verify Installation
# Check core installation
python -c "from kapso.kapso import Kapso; print('Kapso OK')"
# Check orchestrator
python -c "from kapso.execution.orchestrator import OrchestratorAgent; print('Orchestrator OK')"
# Check knowledge search
python -c "from kapso.knowledge_base.search import KnowledgeSearchFactory; print('Knowledge Search OK')"
# Check MLE-Bench (if installed)
python -c "import mlebench; print('MLE-Bench OK')"
# Check ALE-Bench (if installed)
python -c "import ale_bench; print('ALE-Bench OK')"
# Check Neo4j driver
python -c "from neo4j import GraphDatabase; print('Neo4j driver OK')"
# Check Weaviate client
python -c "import weaviate; print('Weaviate client OK')"
Troubleshooting
# Check Docker logs
docker compose -f services/infrastructure/docker-compose.yml logs
# Check individual service
docker logs weaviate
docker logs neo4j
If default ports are in use, modify services/infrastructure/docker-compose.yml or use different port mappings.