Skip to main content
The knowledge graph uses Neo4j to store domain expertise. This guide covers setup and data loading.

Quick Start

1

Start Neo4j container

docker run -d \
    --name neo4j \
    --restart unless-stopped \
    -p 7474:7474 -p 7687:7687 \
    -e NEO4J_AUTH=neo4j/password \
    neo4j:latest
2

Configure environment

Add to .env:
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password
3

Load knowledge data

PYTHONPATH=. python -c "
from src.knowledge.search.kg_llm_navigation_search import KGLLMNavigationSearch
import json

search = KGLLMNavigationSearch()
with open('benchmarks/mle/data/kg_data.json') as f:
    search.index(json.load(f))
print('Knowledge graph loaded successfully')
"
4

Verify

Open http://localhost:7474 in your browser.Run query:
MATCH (n) RETURN count(n) as nodeCount

Environment Variables

VariableDefaultDescription
NEO4J_URIbolt://localhost:7687Connection URI
NEO4J_USERneo4jUsername
NEO4J_PASSWORDpasswordPassword

Docker Options

Persistent Storage

docker run -d \
    --name neo4j \
    --restart unless-stopped \
    -p 7474:7474 -p 7687:7687 \
    -e NEO4J_AUTH=neo4j/password \
    -v neo4j_data:/data \
    neo4j:latest

Custom Password

docker run -d \
    --name neo4j \
    -p 7474:7474 -p 7687:7687 \
    -e NEO4J_AUTH=neo4j/mysecurepassword \
    neo4j:latest
Update .env accordingly.

Memory Settings

docker run -d \
    --name neo4j \
    -p 7474:7474 -p 7687:7687 \
    -e NEO4J_AUTH=neo4j/password \
    -e NEO4J_dbms_memory_heap_max__size=2G \
    neo4j:latest

Knowledge Data Format

The knowledge graph expects JSON with nodes and edges:
{
  "nodes": {
    "0": {
      "id": "0",
      "name": "Kaggle competitions",
      "type": "specialization",
      "content": "Tips for solving Kaggle competitions..."
    },
    "1": {
      "id": "1",
      "name": "Text Classification",
      "type": "workflow",
      "content": "Best approaches for text classification..."
    }
  },
  "edges": [
    {"source": "0", "target": "1"},
    {"source": "1", "target": "2", "relationship": "HAS_CONCEPT"}
  ]
}

Node Types

TypePurpose
specializationRoot category
workflowStep-by-step approach
conceptDetailed tips
codeCode examples

Verify Connection

from neo4j import GraphDatabase

driver = GraphDatabase.driver(
    "bolt://localhost:7687",
    auth=("neo4j", "password")
)

with driver.session() as session:
    result = session.run("MATCH (n) RETURN count(n) as count")
    print(f"Nodes: {result.single()['count']}")

driver.close()

Troubleshooting

Check logs:
docker logs neo4j
Common issues:
  • Port already in use: Stop conflicting containers
  • Memory: Reduce heap size or increase Docker memory
  1. Wait 30-60 seconds for Neo4j to initialize
  2. Check container status: docker ps
  3. Verify ports: docker port neo4j
  1. Verify password in NEO4J_AUTH environment variable
  2. Check .env file matches container settings
  3. Reset: docker rm -f neo4j and recreate

Disable Knowledge Graph

If you don’t need the knowledge graph:
# CLI flag
PYTHONPATH=. python -m benchmarks.mle.runner -c competition --no-kg
Or in config:
knowledge_search:
  enabled: false