Skip to main content
ALE-Bench provides AtCoder Heuristic Contest problems. Tinkerer Agent generates C++ solutions evaluated in Docker containers.

Usage

# List available problems
PYTHONPATH=. python -m benchmarks.ale.runner --list

# List lite benchmark problems
PYTHONPATH=. python -m benchmarks.ale.runner --lite

# Solve a problem
PYTHONPATH=. python -m benchmarks.ale.runner -p ahc039

# With options
PYTHONPATH=. python -m benchmarks.ale.runner \
    -p ahc039 \
    -i 14 \
    -m ALE_CONFIGS \
    -d aider

CLI Options

OptionDescriptionDefault
-p, --problemProblem ID (e.g., ahc039)Required
-i, --iterationsMax experiment iterations14
-m, --modeConfig modeALE_CONFIGS
-d, --coding-agentCoding agentFrom config
--listList all problems-
--liteList lite problems-
--list-agentsList coding agents-

Available Problems

ProblemContestantsScoring
ahc008824Maximize
ahc011926Maximize
ahc015779Maximize
ahc0161047Maximize
ahc024664Maximize
ahc025879Minimize
ahc026740Maximize
ahc027999Minimize
ahc039683Maximize
ahc046939Maximize

Output Structure

The agent generates:
experiment_workspace/{uuid}/
├── main.cpp          # C++ solution
├── pre_run.cpp       # Optional precomputation (max 1 min)
└── sessions/         # Experiment branches

Evaluation

Code Requirements

Generated C++ must:
  • Be time-aware (limit: time_limit - 100ms for I/O)
  • Handle all input constraints
  • Use efficient algorithms and data structures
  • Include compiler optimization pragmas if helpful

Built-in Domain Knowledge

The handler includes tips for common algorithms:
  • Design good state representation
  • Balance small and large moves
  • Avoid recomputation in legality checks
  • Keep regret mechanism for constrained problems
  • Define strong heuristic scoring
  • Consider average and std of scores
  • Balance greedy vs long-horizon moves

Key Differences from MLE-Bench

AspectMLE-BenchALE-Bench
LanguagePythonC++ (cpp23)
Main filemain.pymain.cpp
Debug mode--debug flagN/A
EvaluationCSV gradingDocker tests
Stop conditionMedal achievedNever (fixed iterations)
Knowledge graphRecommendedDisabled by default