Skip to main content
Tinkerer Agent uses YAML configuration files to define modes, strategies, and component settings.

Configuration Files

FilePurpose
src/config.yamlGeneric runner configuration
benchmarks/mle/config.yamlMLE-Bench configuration
benchmarks/ale/config.yamlALE-Bench configuration

Structure

# Default mode to use
default_mode: MLE_CONFIGS

# Available modes
modes:
  MLE_CONFIGS:
    search_strategy:
      type: "llm_tree_search"
      params: {...}
    
    coding_agent:
      type: "aider"
      model: "gpt-4.1"
    
    context_manager:
      type: "kg_enriched"
      params: {...}
    
    knowledge_search:
      type: "kg_llm_navigation"
      enabled: true
      params: {...}

Search Strategy

Coding Agent

coding_agent:
  type: "aider"           # aider, gemini, claude_code, openhands
  model: "gpt-4.1"        # Primary model
  debug_model: "gpt-4.1-mini"  # Model for debugging
Override via CLI: --coding-agent gemini

Context Manager

context_manager:
  type: "kg_enriched"
  params:
    max_experiment_history_count: 5   # Top experiments to include
    max_recent_experiment_count: 5    # Recent experiments to include
knowledge_search:
  type: "kg_llm_navigation"
  enabled: true
  params:
    search_top_k: 1           # Initial nodes to find
    navigation_steps: 3       # Graph navigation depth
    expansion_limit: 3        # Nodes per step
    search_node_type: "specialization"
Or use a preset:
knowledge_search:
  type: "kg_llm_navigation"
  enabled: true
  preset: "DEEP_SEARCH"

Evaluator (Generic Runner)

evaluator:
  type: "regex_pattern"
  params:
    pattern: 'SCORE:\s*([-+]?\d*\.?\d+)'
    default_score: 0.0
Available types:
  • no_score - Always returns 0
  • regex_pattern - Parse score from output
  • file_json - Read score from JSON file
  • llm_judge - LLM-based evaluation

Stop Condition (Generic Runner)

stop_condition:
  type: "composite"
  params:
    conditions:
      - ["threshold", {"threshold": 0.95}]
      - ["max_iterations", {"max_iter": 50}]
    mode: "any"
Available types:
  • never - Run all iterations
  • threshold - Stop at score threshold
  • plateau - Stop if no improvement
  • composite - Combine conditions

Example Modes

Production (MLE-Bench)

MLE_CONFIGS:
  search_strategy:
    type: "llm_tree_search"
    params:
      reasoning_effort: "high"
      code_debug_tries: 15
  coding_agent:
    type: "aider"
    model: "o3"
  knowledge_search:
    enabled: true

Testing

MINIMAL:
  search_strategy:
    type: "llm_tree_search"
    params:
      reasoning_effort: "medium"
      code_debug_tries: 2
  coding_agent:
    type: "aider"
    model: "gpt-4.1-mini"
  knowledge_search:
    enabled: false

CLI Override

Most settings can be overridden via CLI:
PYTHONPATH=. python -m benchmarks.mle.runner \
    -c competition \
    -m MINIMAL \           # Mode override
    -d gemini \            # Coding agent override
    -i 10 \                # Iterations override
    --no-kg                # Disable knowledge graph