Overview
The Feedback Generator is an LLM-based component that validates evaluation results and decides whether to continue or stop the experimentation loop. It replaces the traditional evaluator and stop condition components with a more flexible, intelligent approach.How It Works
After the developer agent implements a solution and runs evaluation, the feedback generator:- Validates the evaluation: Checks if the agent-built evaluation is fair and correct
- Extracts the score: Parses evaluation output to get numeric scores
- Checks goal completion: Determines if the goal has been achieved
- Generates feedback: Provides actionable suggestions for the next iteration
FeedbackResult
The feedback generator returns aFeedbackResult:
Responsibilities
1. Validate Evaluation
The feedback generator checks if the agent-built evaluation is fair and correct:- Does the evaluation actually test the goal criteria?
- Is the evaluation not trivially passing (e.g.,
print("SCORE: 1.0"))? - Are the metrics appropriate for the problem?
evaluation_valid=False and provides feedback to fix the evaluation.
2. Check Goal Completion
The feedback generator determines if the goal has been achieved by:- Parsing the evaluation output for success criteria
- Comparing scores against thresholds mentioned in the goal
- Understanding semantic success (e.g., “all tests passed”)
3. Extract Score
The feedback generator parses the evaluation output to extract numeric scores:4. Generate Feedback
If the goal is not achieved, the feedback generator provides actionable suggestions:Usage
Automatic (via evolve)
The feedback generator is automatically used when you callkapso.evolve():
Direct Usage
Configuration
The feedback generator is integrated within the search strategy. You can configure the agent type in the search strategy configuration:Integration with Search Strategy
Feedback generation happens within the search strategy, not the orchestrator. This allows:- Per-node feedback in tree search (each node gets its own feedback)
- Unified flow where implementation and feedback are tightly coupled
- Cleaner orchestrator that just checks
node.should_stop
Comparison with Legacy Evaluators
| Aspect | Legacy Evaluators | Feedback Generator |
|---|---|---|
| Configuration | Predefined patterns (regex, JSON) | LLM understands any format |
| Flexibility | Fixed evaluation logic | Adapts to any domain |
| Stop Decision | Separate stop condition | Integrated in feedback |
| Validation | None | Validates evaluation fairness |
| Feedback | Simple score | Actionable suggestions |
Best Practices
1. Include Success Criteria in Goal
The feedback generator works best when the goal includes clear success criteria:2. Let Agent Build Evaluation
Don’t provideeval_dir unless you have specific evaluation requirements. The agent builds domain-appropriate evaluation: