Overview
The Knowledge Learning Pipeline is a two-stage process that transforms raw sources (repositories, research, experiments) into structured wiki pages in the Knowledge Graph.Using the Pipeline
Full Pipeline
Via Kapso API
Extract Only (No Merge)
Source Types
TheSource namespace provides typed wrappers for knowledge inputs:
| Source Type | Description | Status |
|---|---|---|
Source.Repo(url, branch="main") | Git repository | Implemented |
Source.Solution(solution) | Completed experiment | Basic |
Source.Idea(query, source, content) | Research idea | Implemented |
Source.Implementation(query, source, content) | Code implementation | Implemented |
Source.ResearchReport(query, content) | Research report | Implemented |
Stage 1: Ingestors
Ingestors extract WikiPages from sources. Each source type has a dedicated ingestor.IngestorFactory
RepoIngestor
The most sophisticated ingestor, using a two-branch multi-phase pipeline: Phase 0: Repository Understanding (pre-phase)- Parse repo structure, generate
_RepoMap.mdwith AST info - Agent fills in natural language understanding for each file
- Subsequent phases read this file instead of re-exploring
- Phase 1a: Anchoring - Find workflows from README/examples, write Workflow pages + rough WorkflowIndex
- Phase 1b: Anchoring Context - Enrich WorkflowIndex with detailed implementation context
- Phase 2: Excavation+Synthesis - Trace imports, write Implementation-Principle pairs together
- Phase 3: Enrichment - Mine constraints/tips, write Environment/Heuristic pages
- Phase 4: Audit - Validate graph integrity, fix broken links
- Phase 4b: Repo Builder - Create GitHub repositories for workflows
- Phase 5a: Triage (code) - Deterministic filtering into AUTO_KEEP/AUTO_DISCARD/MANUAL_REVIEW
- Phase 5b: Review (agent) - Agent evaluates MANUAL_REVIEW files
- Phase 5c: Create (agent) - Agent creates wiki pages for approved files
- Phase 5d: Verify (code) - Verify all approved files have pages
- Phase 6: Orphan Audit - Validate orphan nodes
Research Ingestors
Research ingestors convert web research results into WikiPages using a three-phase agentic pipeline: Phase 1: Planning- Analyzes content and decides what pages to create
- Writes
_plan.mdwith page decisions
- Creates wiki pages following section definitions
- Writes pages to staging directory
- Validates pages and fixes issues
- Ensures graph integrity
| Ingestor | Source Type | Description |
|---|---|---|
IdeaIngestor | idea | Extracts principles from research ideas |
ImplementationIngestor | implementation | Extracts implementations from code examples |
ResearchReportIngestor | researchreport | Extracts comprehensive knowledge from reports |
Stage 2: Knowledge Merger
The merger uses a hierarchical sub-graph-aware algorithm with a single Claude Code agent call. It processes connected pages as units, respecting the Knowledge Graph DAG structure.Wiki Hierarchy
The Knowledge Graph follows a top-down DAG structure:Merge Algorithm
The merger executes a 5-phase process: Phase 1: Sub-Graph Detection- Parses
outgoing_linksto build an adjacency list - Finds root nodes (no incoming edges from proposed pages)
- Groups connected components into sub-graphs
- For each sub-graph, makes merge decisions starting from root:
- Root decision: Search for similar pages of same type →
MERGEorCREATE_NEW - Children decisions (recursive):
- If parent =
CREATE_NEW→ child inheritsCREATE_NEW(no search needed) - If parent =
MERGE→ search only among target’s children →MERGEorCREATE_NEW
- If parent =
- Special case: Heuristics with multiple parents use lowest parent for scoped search
- Root decision: Search for similar pages of same type →
- Computes execution order (bottom-up): Environment → Heuristic → Implementation → Principle
- Records deferred edges (which parent adds edge after processing)
- Processes nodes in computed order
- For
CREATE_NEW: Get page structure, prepare content, callkg_index - For
MERGE: Get page structure, fetch target, merge content intelligently, callkg_edit - Updates
outgoing_linksto point to processed children’sresult_page_id
- Verifies nodes exist (
CREATE_NEW/MERGE) - Verifies edges (parent has edge to child’s
result_page_id) - Handles failures with retries (max 3)
- Collects all
result_page_idvalues - Categorizes as created (
CREATE_NEW) or edited (MERGE) - Writes final summary to
_merge_plan.md
Merge Actions
| Action | Description |
|---|---|
CREATE_NEW | New page for novel knowledge |
MERGE | Update existing page with new content |
Edge Types
| From | Edge Type | To |
|---|---|---|
| Principle | implemented_by | Implementation |
| Principle | uses_heuristic | Heuristic |
| Implementation | requires_env | Environment |
| Implementation | uses_heuristic | Heuristic |
Important Rules
- Same-type search only: Principles search among Principles, Implementations among Implementations, etc.
- Scoped search: When parent is
MERGE, children search only among the target’s children - Inherited CREATE_NEW: If parent is
CREATE_NEW, all descendants areCREATE_NEW(no search) - Additive edges: When merging, keep existing edges and add new ones
- Bottom-up execution: Process leaves (Environment, Heuristic) before parents
- No cross-type merges: Never merge a Principle with an Implementation, etc.
MCP Tools Used
The merger uses these MCP tools via thekg-graph-search server:
| Tool | Purpose |
|---|---|
search_knowledge | Find similar pages in the graph |
get_wiki_page | Read existing page content by title |
get_page_structure | Get sections definition for a page type |
kg_index | Create new page in the graph |
kg_edit | Update existing page in the graph |
Using the Merger
Merge Modes
The merger operates in two modes based on whether a KG index exists:- No Index Mode: Creates all pages as new, writes to wiki directory, then creates index
- Merge Mode: Runs agentic hierarchical merge using MCP tools
- Explicit
kg_index_pathfrom config - Auto-detect
.indexfile in wiki directory
Merge Result
WikiPage Structure
CLI Usage
CLI Options
| Option | Short | Description |
|---|---|---|
--type | -t | Source type: repo, paper, solution |
--branch | -b | Git branch (default: main) |
--extract-only | -e | Only extract, don’t merge |
--wiki-dir | -w | Wiki directory path |
--verbose | -v | Enable verbose logging |