Skip to main content

Core Installation

1

Install from PyPI (recommended)

pip install leeroo-kapso
2

Or install from source (for development or wiki data)

git clone https://github.com/leeroo-ai/kapso.git
cd kapso

# Pull Git LFS files (wiki knowledge data)
git lfs install
git lfs pull

# Create conda environment (recommended)
conda create -n kapso_conda python=3.12
conda activate kapso_conda

# Install in development mode
pip install -e .
This repository uses Git LFS for large files in data/wikis_batch_top100/. If you didn’t install Git LFS before cloning, run these commands to fetch the files.
3

Configure API keys

Create .env in project root:
# Required for most operations
OPENAI_API_KEY=your-openai-api-key

# For Gemini coding agent
GOOGLE_API_KEY=your-google-api-key

# For Claude Code coding agent
ANTHROPIC_API_KEY=your-anthropic-api-key

# For Leeroopedia MCP (curated ML/AI knowledge — sign up at leeroopedia.com)
LEEROOPEDIA_API_KEY=your-leeroopedia-api-key

Coding Agent Setup

Kapso supports multiple coding agents. Install the ones you plan to use:
Aider is installed automatically with pip install -e .No additional setup required. Uses git-centric diff-based editing.
# Verify installation
aider --version

Leeroopedia MCP (Optional)

Connect Kapso to Leeroopedia — a curated knowledge base of 1000+ ML/AI frameworks. Kapso agents use it during ideation and implementation to search docs, build plans, verify code, diagnose failures, and look up hyperparameter defaults.
pip install leeroopedia-mcp
Sign up at app.leeroopedia.com for an API key ($20 free credit, no credit card required), then add to your .env:
LEEROOPEDIA_API_KEY=kpsk_your_key_here
See the Leeroopedia MCP docs for Claude Code and Cursor setup.

Benchmark Installation

MLE-Bench provides Kaggle competition problems.Prerequisites:
  • Git LFS (sudo apt-get install git-lfs or brew install git-lfs)
Installation:
# Clone and install MLE-Bench
git clone https://github.com/openai/mle-bench.git
cd mle-bench
git lfs install
git lfs fetch --all
git lfs pull
pip install -e .
cd ..

# Install MLE-specific dependencies
pip install -r benchmarks/mle/requirements.txt
Verify:
PYTHONPATH=. python -m benchmarks.mle.runner --list

Infrastructure Setup

Kapso uses Docker containers for its knowledge graph infrastructure:
  • Weaviate: Vector database for semantic search
  • Neo4j: Graph database for relationships
  • MediaWiki (optional): Web UI for browsing wiki pages
# Start all infrastructure
./scripts/start_infra.sh

# Stop infrastructure (data preserved)
./scripts/stop_infra.sh

# Stop and wipe all data
./scripts/stop_infra.sh --volumes

Default Service URLs

ServiceURLCredentials
MediaWikihttp://localhost:8090admin / adminpass123
Neo4j Browserhttp://localhost:7474neo4j / password
Weaviatehttp://localhost:8080Anonymous (no auth)

Manual Setup

If you prefer to start services individually:
docker compose -f services/infrastructure/docker-compose.yml up -d
This starts:
  • Weaviate (port 8080) - Vector database for embeddings
  • Neo4j (ports 7474, 7687) - Graph database for relationships
  • MediaWiki (port 8090) - Web UI for browsing wiki pages
  • MariaDB - Backend database for MediaWiki
Weaviate (vector DB):
docker run -d --name weaviate \
    -p 8080:8080 -p 50051:50051 \
    -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true \
    -e PERSISTENCE_DATA_PATH='/var/lib/weaviate' \
    semitechnologies/weaviate:1.27.0
Neo4j (graph DB):
docker run -d --name neo4j \
    --restart unless-stopped \
    -p 7474:7474 -p 7687:7687 \
    -e NEO4J_AUTH=neo4j/password \
    neo4j:5.18.0

Configure Environment

Add infrastructure settings to your .env:
# Neo4j connection
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password

# Weaviate connection
WEAVIATE_URL=http://localhost:8080

# MediaWiki (optional)
MW_BASE=http://localhost:8090
MW_USER=admin
MW_PASS=adminpass123

Knowledge Graph Indexing

After infrastructure is running, index your wiki pages:
from kapso.kapso import Kapso

# Initialize Kapso
kapso = Kapso(config_path="src/config.yaml")

# Index wiki pages (one-time operation)
kapso.index_kg(
    wiki_dir="data/wikis",
    save_to="data/indexes/my_knowledge.index",
)

# Load existing index on subsequent runs
kapso = Kapso(
    config_path="src/config.yaml",
    kg_index="data/indexes/my_knowledge.index",
)

Index File Format

The .index file is a JSON reference to the indexed data:
{
  "version": "1.0",
  "created_at": "2025-01-15T10:30:00Z",
  "data_source": "data/wikis_llm_finetuning",
  "search_backend": "kg_graph_search",
  "backend_refs": {
    "weaviate_collection": "KapsoWiki",
    "embedding_model": "text-embedding-3-large"
  },
  "page_count": 99
}

KG Search Backends

BackendData FormatStorageUse Case
kg_graph_searchWiki pages (.md/.mediawiki)Weaviate + Neo4jSemantic search with LLM reranking
kg_llm_navigationJSON (nodes/edges)Neo4j onlyLLM-guided graph navigation

Environment Variables Reference

VariableRequiredDefaultDescription
OPENAI_API_KEYYes-OpenAI API key (also for embeddings)
GOOGLE_API_KEYNo-Google API key for Gemini
ANTHROPIC_API_KEYNo-Anthropic API key for Claude
LEEROOPEDIA_API_KEYNo-Leeroopedia API key (sign up)
NEO4J_URINobolt://localhost:7687Neo4j connection URI
NEO4J_USERNoneo4jNeo4j username
NEO4J_PASSWORDNopasswordNeo4j password
WEAVIATE_URLNohttp://localhost:8080Weaviate server URL
MW_BASENohttp://localhost:8090MediaWiki base URL
MW_USERNoadminMediaWiki username
MW_PASSNoadminpass123MediaWiki password
CUDA_DEVICENo0GPU device for ML training

Verify Installation

# Check core installation
python -c "from kapso.kapso import Kapso; print('Kapso OK')"

# Check orchestrator
python -c "from kapso.execution.orchestrator import OrchestratorAgent; print('Orchestrator OK')"

# Check knowledge search
python -c "from kapso.knowledge_base.search import KnowledgeSearchFactory; print('Knowledge Search OK')"

# Check MLE-Bench (if installed)
python -c "import mlebench; print('MLE-Bench OK')"

# Check ALE-Bench (if installed)
python -c "import ale_bench; print('ALE-Bench OK')"

# Check Neo4j driver
python -c "from neo4j import GraphDatabase; print('Neo4j driver OK')"

# Check Weaviate client
python -c "import weaviate; print('Weaviate client OK')"

Troubleshooting

# Check Docker logs
docker compose -f services/infrastructure/docker-compose.yml logs

# Check individual service
docker logs weaviate
docker logs neo4j
# View MediaWiki logs
docker compose -f services/infrastructure/docker-compose.yml logs wiki

# Full reset (deletes all data)
./scripts/stop_infra.sh --volumes
./scripts/start_infra.sh
If default ports are in use, modify services/infrastructure/docker-compose.yml or use different port mappings.