Build Repo Context

Crawl GitHub history (PRs, issues, review comments) and distill institutional knowledge into agent_artefacts/repo_context/REPO_CONTEXT.md. This document helps worker agents understand repo conventions, common mistakes, and known tech debt before making changes.

Workflow

1. Setup

Create agent_artefacts/repo_context/ if it doesn't exist
Read existing agent_artefacts/repo_context/REPO_CONTEXT.md if present (will be updated, not replaced)

2. Identify What's New

Use the header of REPO_CONTEXT.md to determine what to process. The header contains the last-updated date and PR range (e.g., PRs processed: #965-#1050).

First run (no REPO_CONTEXT.md): Fetch the most recent 50 merged PRs + all open issues
Incremental runs: Fetch PRs merged after the highest PR number in the header, and issues updated since the last-updated date

Use the gh CLI to list candidates:

# First run: recent merged PRs
gh pr list --state merged --limit 50 --json number,title,labels,additions,deletions,reviewDecision,mergedAt

# Incremental: PRs merged since last crawl
gh pr list --state merged --search "merged:>YYYY-MM-DD" --limit 50 --json number,title,labels,additions,deletions,reviewDecision,mergedAt

# Open issues
gh issue list --state open --limit 100 --json number,title,labels,createdAt,updatedAt

3. Triage

Fast pass over PR titles and metadata. Skip these categories (they rarely contain design insights):

Dependency bumps (titles matching bump, update dependencies, renovate, dependabot)
Changelog-only updates (titles matching changelog, scriv)
Bot-generated PRs with no review comments
PRs with fewer than 5 lines changed and no review comments

Prioritize PRs that have:

Review comments (especially multiple rounds — that's where design discussion lives)
Changes touching shared utilities (src/inspect_evals/utils/, CONTRIBUTING.md, BEST_PRACTICES.md, AGENTS.md)

Cap at 50 PRs per run to keep execution time reasonable.

4. Extract

For each selected PR, fetch:

# PR body and metadata
gh pr view <N> --json body,title,labels,files,reviewDecision,comments,reviews

# Review comments (inline code review feedback)
gh api repos/{owner}/{repo}/pulls/<N>/comments --paginate

# Issue comments (general discussion)
gh api repos/{owner}/{repo}/issues/<N>/comments --paginate

For open issues, fetch body and comments similarly.

Link traversal: If a comment references another PR/issue (e.g., "see #123" or "fixed in #456"), continue to crawl recursively up to 3 hops in total. Do not recurse to an existing PR/issue in the chain to prevent loops.

5. Distill

This is the core intellectual work. For each PR/issue, extract actionable insights in these categories:

Design decisions: What architectural choice was made and why? What alternatives were rejected?
Reviewer corrections: What mistakes did reviewers catch? These reveal common pitfalls.
Established conventions: What patterns were deliberately chosen that future contributors should follow?
Tech debt acknowledged: What shortcuts were taken intentionally? What should NOT be "fixed" without discussion?
Common agent mistakes: If review comments mention agent-generated code issues, capture the pattern.

Quality requirements for each insight:

Must cite source PR/issue number (e.g., "Per PR #973...")
Must be actionable ("Do X" / "Don't do Y"), not descriptive ("PR #123 added X")
Must add nuance beyond what CONTRIBUTING.md and BEST_PRACTICES.md already state
Must be relevant to future contributors, not just historically interesting
Must be broadly applicable beyond a single issue or evaluation. If the context is excessively narrow, leave it out.
Must reflect team convention, not a single maintainer's code style or proposal. If in doubt, leave it out.

Skip:

Bot comments (dependabot, renovate, CI status checks)
Feature announcements without design implications
Trivial PRs (typo fixes, version bumps) unless they reveal a convention
Duplicate insights already captured in REPO_CONTEXT.md

6. Merge Into REPO_CONTEXT.md

Integrate new insights into the existing document structure. Do not just append — place each insight in the appropriate section and deduplicate:

If a new insight updates or supersedes an existing one, replace it
If a section is getting too long, distill further (combine related insights)
Update the header metadata (last updated date, PR watermark)
Keep total document size between 500-1000 lines (aggressive distillation if over)

Each insight appears in exactly one section — do not repeat the same rule across multiple sections with different framing (see step 7).

7. Deduplicate & Consolidate

After merging, review the full document for cross-section duplication. This is critical — incremental runs naturally introduce duplication because the same convention surfaces in multiple PR reviews (e.g., "use @pytest.mark.docker" might appear as a reviewer correction, an established convention, AND a testing recipe).

Process:

For each insight, search the entire document for overlapping content. Look for insights that cover the same topic even if phrased differently.
Keep each insight in exactly one location — the most specific section that fits. Prefer this priority:
- "Rules & Conventions" for mandatory practices ("always do X", "never do Y")
- "Testing Recipes" for detailed how-to patterns (mock setup, test structure)
- "Known Tech Debt" for acknowledged issues that should not be fixed without discussion
- "CI/Tooling" for build/CI/tooling specifics
- "Open Issues" for bugs and design direction
Remove the duplicate occurrences, keeping the most complete/specific version.
Combine related insights that are split across bullets into a single, richer bullet.

Common duplication patterns to watch for:

The same pytest marker rule appearing in both "Rules" and "Testing Recipes"
Reviewer corrections that duplicate established conventions (merge into the convention)
Agent mistakes that are just the inverse of an established convention (keep only the convention)
API usage patterns appearing in both rules and recipes (keep the rule brief, detail in recipes)

Bounding Rules

Rule	Limit
First run scope	Most recent 50 merged PRs + all open issues
Incremental run scope	New items since last crawl
Max PRs per run	50
Link traversal depth	3 hops
Target REPO_CONTEXT.md size	500-1000 lines
Max issues per run	100

Insight Quality Guidelines

These are critical — the value of REPO_CONTEXT.md depends on insight quality:

Every insight must cite its source PR or issue number. It is acceptable to cite multiple sources for the same insight.
Insights must be actionable: "Do X" / "Don't do Y", not "PR #123 added X"
Don't duplicate existing docs: Only add nuance that CONTRIBUTING.md and BEST_PRACTICES.md miss
Skip noise: Bot comments, feature announcements without design implications, trivial PRs
Focus on: Reviewer corrections, design trade-offs, rejected alternatives, acknowledged tech debt, common agent mistakes
Be specific: "Use hf_dataset() wrapper instead of raw load_dataset() for HuggingFace datasets (PR #842)" is better than "Use the right dataset loading function"
Date-stamp volatile insights: If an insight might become stale (e.g., "Currently X is broken"), include the date so agents can verify

Expected Output

After running this workflow:

agent_artefacts/repo_context/
└── REPO_CONTEXT.md     # Distilled institutional knowledge (committed)

Verification Checklist

After each run, verify:

REPO_CONTEXT.md exists and has well-structured content
Insights cite source PR/issue numbers
Insights are actionable, not merely descriptive
No duplicate insights across sections — search for key terms (e.g., sample ID, get_model, @pytest.mark) and confirm each appears in exactly one place
Document stays under ~1000 lines
Header metadata (date, PR range) is updated
Incremental runs don't reprocess already-crawled PRs

build-repo-context

details