Skip to main content

MLE-Bench Runner

PYTHONPATH=. python -m benchmarks.mle.runner [OPTIONS]

Options

OptionShortTypeDefaultDescription
--competition-cstringRequiredCompetition ID
--iterations-iint20Max experiment iterations
--mode-mstringMLE_CONFIGSConfiguration mode
--coding-agent-dstringFrom configCoding agent type
--no-kg-flag-Disable knowledge graph
--list-flag-List all competitions
--lite-flag-List lite competitions
--list-agents-flag-List coding agents

Examples

# Solve a competition
PYTHONPATH=. python -m benchmarks.mle.runner -c tabular-playground-series-dec-2021

# With custom settings
PYTHONPATH=. python -m benchmarks.mle.runner \
    -c spooky-author-identification \
    -i 10 \
    -m MINIMAL \
    -d gemini \
    --no-kg

# List competitions
PYTHONPATH=. python -m benchmarks.mle.runner --list
PYTHONPATH=. python -m benchmarks.mle.runner --lite

ALE-Bench Runner

PYTHONPATH=. python -m benchmarks.ale.runner [OPTIONS]

Options

OptionShortTypeDefaultDescription
--problem-pstringRequiredProblem ID (e.g., ahc039)
--iterations-iint14Max experiment iterations
--mode-mstringALE_CONFIGSConfiguration mode
--coding-agent-dstringFrom configCoding agent type
--list-flag-List all problems
--lite-flag-List lite problems
--list-agents-flag-List coding agents

Examples

# Solve a problem
PYTHONPATH=. python -m benchmarks.ale.runner -p ahc039

# With custom settings
PYTHONPATH=. python -m benchmarks.ale.runner \
    -p ahc039 \
    -i 10 \
    -m MINIMAL

# List problems
PYTHONPATH=. python -m benchmarks.ale.runner --list

Generic Runner

PYTHONPATH=. python -m src.runner [OPTIONS]

Options

OptionShortTypeDefaultDescription
--problem-pstring-Problem description (inline)
--problem-file-fstring-File with problem description
--iterations-iint10Max experiment iterations
--mode-mstringGENERICConfiguration mode
--coding-agent-dstringFrom configCoding agent type
--main-file-stringmain.pyEntry point file
--language-stringpythonProgramming language
--timeout-int300Execution timeout (seconds)
--evaluator-stringno_scoreEvaluator type
--stop-condition-stringneverStop condition type
--context-string-Additional context
--list-agents-flag-List coding agents
--list-evaluators-flag-List evaluator types
--list-stop-conditions-flag-List stop condition types

Examples

# Inline problem
PYTHONPATH=. python -m src.runner -p "Create a prime number generator"

# From file
PYTHONPATH=. python -m src.runner -f problem.txt -i 10

# With scoring
PYTHONPATH=. python -m src.runner \
    -f problem.txt \
    --evaluator regex_pattern \
    --stop-condition threshold

# List options
PYTHONPATH=. python -m src.runner --list-evaluators
PYTHONPATH=. python -m src.runner --list-stop-conditions

Coding Agents

Available agents (use with -d):
AgentDescription
aiderAider CLI tool (default)
geminiGoogle Gemini API
claude_codeAnthropic Claude
openhandsOpenHands platform
List with details:
PYTHONPATH=. python -m benchmarks.mle.runner --list-agents

Configuration Modes

MLE-Bench

ModeDescription
MLE_CONFIGSProduction (full features)
HEAVY_EXPERIMENTATIONMany parallel experiments
LINEARSequential search (testing)
MINIMALFast testing mode

ALE-Bench

ModeDescription
ALE_CONFIGSProduction

Generic

ModeDescription
GENERICStandard configuration
MINIMALFast testing mode
TREE_SEARCHTree search for complex problems
SCOREDWith regex evaluator

Environment Variables

VariableRequiredDescription
OPENAI_API_KEYYesOpenAI API key
GOOGLE_API_KEYYesGoogle API key
ANTHROPIC_API_KEYNoAnthropic API key
CUDA_DEVICENoGPU device ID (default: 0)
NEO4J_URINoNeo4j connection URI
NEO4J_USERNoNeo4j username
NEO4J_PASSWORDNoNeo4j password