Skip to main content
Coding agents generate code from solution descriptions. They’re abstracted behind a common interface, allowing easy swapping.

Available Agents

AgentDescriptionGit Support
aiderAider CLI toolNative
geminiGoogle Gemini APISession
claude_codeAnthropic ClaudeSession
openhandsOpenHands platformSession

CLI Selection

# Use specific agent
PYTHONPATH=. python -m benchmarks.mle.runner -c competition -d gemini

# List available agents
PYTHONPATH=. python -m benchmarks.mle.runner --list-agents

Configuration

In your mode’s YAML config:
coding_agent:
  type: "aider"
  model: "gpt-4.1"
  debug_model: "gpt-4.1-mini"
Or via factory:
from src.execution.coding_agents.factory import CodingAgentFactory

config = CodingAgentFactory.build_config(
    agent_type="aider",
    model="gpt-4.1",
    debug_model="gpt-4.1-mini",
)

Agent Interface

All agents implement:
class CodingAgentInterface(ABC):
    @abstractmethod
    def initialize(self, workspace: str) -> None:
        """Initialize the agent for a workspace."""
        pass
    
    @abstractmethod
    def generate_code(self, prompt: str, debug_mode: bool = False) -> CodingResult:
        """Generate code from a prompt."""
        pass
    
    @abstractmethod
    def supports_native_git(self) -> bool:
        """Whether agent handles its own git commits."""
        pass
    
    @abstractmethod
    def get_cumulative_cost(self) -> float:
        """Get total cost incurred."""
        pass
    
    @abstractmethod
    def cleanup(self) -> None:
        """Clean up resources."""
        pass

CodingResult

@dataclass
class CodingResult:
    success: bool
    output: str              # Generated code or response
    error: Optional[str]     # Error message if failed
    files_changed: List[str] # Modified files
    cost: float              # API cost
    commit_message: Optional[str]  # Suggested commit message

Git Integration

Agents either:
  • Native git: Handle their own commits (like Aider)
  • Session git: ExperimentSession commits for them
class ExperimentSession:
    def generate_code(self, prompt, debug_mode=False):
        result = self.coding_agent.generate_code(prompt, debug_mode)
        
        # If agent doesn't handle git, we commit
        if not self._agent_handles_git and result.success:
            self._commit_with_message(result)
        
        return result

Agent Configuration

From agents.yaml:
agents:
  aider:
    description: "Aider CLI code assistant"
    adapter_module: "src.execution.coding_agents.adapters.aider_adapter"
    adapter_class: "AiderAdapter"
    default_model: "gpt-4.1"
    default_debug_model: "gpt-4.1-mini"
    supports_native_git: true
    
  gemini:
    description: "Google Gemini API"
    adapter_module: "src.execution.coding_agents.adapters.gemini_adapter"
    adapter_class: "GeminiAdapter"
    default_model: "gemini-2.5-pro"
    supports_native_git: false

default_agent: "aider"

Debug Mode

When debug_mode=True:
  • May use a smaller/faster model
  • Prompts focus on fixing errors
  • Typically faster and cheaper
# From agent config
model = config.debug_model if debug_mode else config.model

Cost Tracking

Each agent tracks API costs:
def get_cumulative_cost(self):
    return self._total_cost
The orchestrator aggregates:
total = (
    orchestrator.llm.get_cumulative_cost()
    + workspace.get_cumulative_cost()  # From coding agents
    + handler.llm.get_cumulative_cost()
)