Skill Index

inspect_evals/

build-repo-context

community[skill]

Crawl repository PRs, issues, and review comments to distill institutional knowledge into a shared knowledge base. Run periodically by "context agents" to maintain agent_artefacts/repo_context/REPO_CONTEXT.md. Trigger only on specific request.

$/plugin install inspect_evals

details

Build Repo Context

Crawl GitHub history (PRs, issues, review comments) and distill institutional knowledge into agent_artefacts/repo_context/REPO_CONTEXT.md. This document helps worker agents understand repo conventions, common mistakes, and known tech debt before making changes.

Workflow

1. Setup

  1. Create agent_artefacts/repo_context/ if it doesn't exist
  2. Read existing agent_artefacts/repo_context/REPO_CONTEXT.md if present (will be updated, not replaced)

2. Identify What's New

Use the header of REPO_CONTEXT.md to determine what to process. The header contains the last-updated date and PR range (e.g., PRs processed: #965-#1050).

  • First run (no REPO_CONTEXT.md): Fetch the most recent 50 merged PRs + all open issues
  • Incremental runs: Fetch PRs merged after the highest PR number in the header, and issues updated since the last-updated date

Use the gh CLI to list candidates:

# First run: recent merged PRs
gh pr list --state merged --limit 50 --json number,title,labels,additions,deletions,reviewDecision,mergedAt

# Incremental: PRs merged since last crawl
gh pr list --state merged --search "merged:>YYYY-MM-DD" --limit 50 --json number,title,labels,additions,deletions,reviewDecision,mergedAt

# Open issues
gh issue list --state open --limit 100 --json number,title,labels,createdAt,updatedAt

3. Triage

Fast pass over PR titles and metadata. Skip these categories (they rarely contain design insights):

  • Dependency bumps (titles matching bump, update dependencies, renovate, dependabot)
  • Changelog-only updates (titles matching changelog, scriv)
  • Bot-generated PRs with no review comments
  • PRs with fewer than 5 lines changed and no review comments

Prioritize PRs that have:

  • Review comments (especially multiple rounds — that's where design discussion lives)
  • Changes touching shared utilities (src/inspect_evals/utils/, CONTRIBUTING.md, BEST_PRACTICES.md, AGENTS.md)

Cap at 50 PRs per run to keep execution time reasonable.

4. Extract

For each selected PR, fetch:

# PR body and metadata
gh pr view <N> --json body,title,labels,files,reviewDecision,comments,reviews

# Review comments (inline code review feedback)
gh api repos/{owner}/{repo}/pulls/<N>/comments --paginate

# Issue comments (general discussion)
gh api repos/{owner}/{repo}/issues/<N>/comments --paginate

For open issues, fetch body and comments similarly.

Link traversal: If a comment references another PR/issue (e.g., "see #123" or "fixed in #456"), continue to crawl recursively up to 3 hops in total. Do not recurse to an existing PR/issue in the chain to prevent loops.

5. Distill

This is the core intellectual work. For each PR/issue, extract actionable insights in these categories:

  • Design decisions: What architectural choice was made and why? What alternatives were rejected?
  • Reviewer corrections: What mistakes did reviewers catch? These reveal common pitfalls.
  • Established conventions: What patterns were deliberately chosen that future contributors should follow?
  • Tech debt acknowledged: What shortcuts were taken intentionally? What should NOT be "fixed" without discussion?
  • Common agent mistakes: If review comments mention agent-generated code issues, capture the pattern.

Quality requirements for each insight:

  • Must cite source PR/issue number (e.g., "Per PR #973...")
  • Must be actionable ("Do X" / "Don't do Y"), not descriptive ("PR #123 added X")
  • Must add nuance beyond what CONTRIBUTING.md and BEST_PRACTICES.md already state
  • Must be relevant to future contributors, not just historically interesting
  • Must be broadly applicable beyond a single issue or evaluation. If the context is excessively narrow, leave it out.
  • Must reflect team convention, not a single maintainer's code style or proposal. If in doubt, leave it out.

Skip:

  • Bot comments (dependabot, renovate, CI status checks)
  • Feature announcements without design implications
  • Trivial PRs (typo fixes, version bumps) unless they reveal a convention
  • Duplicate insights already captured in REPO_CONTEXT.md

6. Merge Into REPO_CONTEXT.md

Integrate new insights into the existing document structure. Do not just append — place each insight in the appropriate section and deduplicate:

  • If a new insight updates or supersedes an existing one, replace it
  • If a section is getting too long, distill further (combine related insights)
  • Update the header metadata (last updated date, PR watermark)
  • Keep total document size between 500-1000 lines (aggressive distillation if over)

Each insight appears in exactly one section — do not repeat the same rule across multiple sections with different framing (see step 7).

7. Deduplicate & Consolidate

After merging, review the full document for cross-section duplication. This is critical — incremental runs naturally introduce duplication because the same convention surfaces in multiple PR reviews (e.g., "use @pytest.mark.docker" might appear as a reviewer correction, an established convention, AND a testing recipe).

Process:

  1. For each insight, search the entire document for overlapping content. Look for insights that cover the same topic even if phrased differently.
  2. Keep each insight in exactly one location — the most specific section that fits. Prefer this priority:
    • "Rules & Conventions" for mandatory practices ("always do X", "never do Y")
    • "Testing Recipes" for detailed how-to patterns (mock setup, test structure)
    • "Known Tech Debt" for acknowledged issues that should not be fixed without discussion
    • "CI/Tooling" for build/CI/tooling specifics
    • "Open Issues" for bugs and design direction
  3. Remove the duplicate occurrences, keeping the most complete/specific version.
  4. Combine related insights that are split across bullets into a single, richer bullet.

Common duplication patterns to watch for:

  • The same pytest marker rule appearing in both "Rules" and "Testing Recipes"
  • Reviewer corrections that duplicate established conventions (merge into the convention)
  • Agent mistakes that are just the inverse of an established convention (keep only the convention)
  • API usage patterns appearing in both rules and recipes (keep the rule brief, detail in recipes)

Bounding Rules

RuleLimit
First run scopeMost recent 50 merged PRs + all open issues
Incremental run scopeNew items since last crawl
Max PRs per run50
Link traversal depth3 hops
Target REPO_CONTEXT.md size500-1000 lines
Max issues per run100

Insight Quality Guidelines

These are critical — the value of REPO_CONTEXT.md depends on insight quality:

  1. Every insight must cite its source PR or issue number. It is acceptable to cite multiple sources for the same insight.
  2. Insights must be actionable: "Do X" / "Don't do Y", not "PR #123 added X"
  3. Don't duplicate existing docs: Only add nuance that CONTRIBUTING.md and BEST_PRACTICES.md miss
  4. Skip noise: Bot comments, feature announcements without design implications, trivial PRs
  5. Focus on: Reviewer corrections, design trade-offs, rejected alternatives, acknowledged tech debt, common agent mistakes
  6. Be specific: "Use hf_dataset() wrapper instead of raw load_dataset() for HuggingFace datasets (PR #842)" is better than "Use the right dataset loading function"
  7. Date-stamp volatile insights: If an insight might become stale (e.g., "Currently X is broken"), include the date so agents can verify

Expected Output

After running this workflow:

agent_artefacts/repo_context/
└── REPO_CONTEXT.md     # Distilled institutional knowledge (committed)

Verification Checklist

After each run, verify:

  1. REPO_CONTEXT.md exists and has well-structured content
  2. Insights cite source PR/issue numbers
  3. Insights are actionable, not merely descriptive
  4. No duplicate insights across sections — search for key terms (e.g., sample ID, get_model, @pytest.mark) and confirm each appears in exactly one place
  5. Document stays under ~1000 lines
  6. Header metadata (date, PR range) is updated
  7. Incremental runs don't reprocess already-crawled PRs

technical

github
UKGovernmentBEIS/inspect_evals
stars
517
license
MIT
contributors
100
last commit
2026-05-29T04:29:08Z
file
.claude/skills/build-repo-context/SKILL.md

related