Skip to main content
ALE-Bench provides AtCoder Heuristic Contest problems for evaluating algorithmic optimization agents. Kapso achieved #1 on ALE-Bench.
ALE-Bench Results

Usage

# List available problems
PYTHONPATH=. python -m benchmarks.ale.runner --list

# List lite benchmark problems
PYTHONPATH=. python -m benchmarks.ale.runner --lite

# Solve a problem
PYTHONPATH=. python -m benchmarks.ale.runner -p ahc039

# With options
PYTHONPATH=. python -m benchmarks.ale.runner \
    -p ahc039 \
    -i 14 \
    -m ALE_CONFIGS \
    -d aider

CLI Options

OptionDescriptionDefault
-p, --problemProblem ID (e.g., ahc039)Required
-i, --iterationsMax experiment iterations14
-m, --modeConfig modeALE_CONFIGS
-d, --coding-agentCoding agentFrom config
--listList all problems-
--liteList lite problems-
--list-agentsList coding agents-

Available Problems

ahc008, ahc011, ahc015, ahc016, ahc024, ahc025, ahc026, ahc027, ahc039, ahc046
ALE-Bench uses benchmark_tree_search strategy which uses the handler’s built-in evaluation via handler.run(). This is different from kapso.evolve() which uses agent-built evaluation.

Output Structure

The agent generates:
experiment_workspace/{uuid}/
├── main.cpp          # C++ solution
├── pre_run.cpp       # Optional precomputation (max 1 min)
└── sessions/         # Experiment branches

Evaluation

The evaluation process works as follows:
  1. Code Submission: The main.cpp file is read from the experiment workspace
  2. Docker Evaluation: Code is sent to ale_bench.public_eval() which compiles and runs in an isolated Docker container
  3. Test Execution: Solution runs against all test cases with strict time limits
  4. Validation: Each test case must return ACCEPTED with a non-zero score
  5. Score Stabilization: If all tests pass, the solution runs 4 additional times and scores are averaged for stability
  6. Final Ranking: Private evaluation compares against original contest participants

Code Requirements

Generated C++ must:
  • Be time-aware (limit: time_limit - 100ms for I/O)
  • Handle all input constraints
  • Use efficient algorithms and data structures
  • Include compiler optimization pragmas if helpful

Built-in Domain Knowledge

The handler includes tips for common algorithms:
  • Design good state representation
  • Balance small and large moves
  • Avoid recomputation in legality checks
  • Keep regret mechanism for constrained problems
  • Define strong heuristic scoring
  • Consider average and std of scores
  • Balance greedy vs long-horizon moves