Skill Index

inspect_evals/

prepare-submission-workflow

community[skill]

Prepare an evaluation for PR submission as an entry to the register. Use when user asks to prepare an eval for submission or finalize a PR. Trigger when the user asks you to run the "Prepare Evaluation For Submission" workflow.

$/plugin install inspect_evals

details

Prepare Eval For Submission

Since May 2026, new evaluations are submitted as entries to the register — the evaluation code lives in your own upstream repository, and you add a pointer to it here. Code is no longer added directly to src/inspect_evals/. If the user appears to be submitting evaluation code into the repo, direct them to register/README.md for the full process.

Workflow Steps

To prepare an evaluation for submission as a pull request:

1. Verify upstream repo requirements

The upstream repo must:

  • Have a pyproject.toml with a [project] table so it can be installed via uv sync
  • Declare inspect_ai as a dependency
  • Define each task with the @task decorator from inspect_ai

Ask the user whether their upstream repo meets these requirements. Offer to check for them — if they provide the GitHub repository URL, fetch the repo's pyproject.toml and task files (e.g. via WebFetch on the raw GitHub URLs) to verify the requirements are met. If any requirement is not met, tell the user what needs to be fixed upstream before they can register.

2. Gather information and create register/<eval_name>/eval.yaml

Skip this step if register/<eval_name>/eval.yaml already exists.

Use register/example_eval.yaml as the template — it documents every field. Don't ask the user field-by-field; instead, derive what you can from the upstream repo first, then ask one batched question for what's missing.

Hints on what to derive from the upstream repo (don't ask):

  • source.repository_url — from step 1.
  • source.repository_commit — fetch the latest commit SHA on the default branch (must be a 40-char SHA, not a tag or branch).
  • tasks[].name and tasks[].task_path — locate every @task-decorated function in the repo and record the function name and file path.
  • title — from the upstream README heading or pyproject.toml [project].name.
  • description — draft from the upstream README; keep to one short paragraph since the generated README links back upstream.
  • source.maintainers — defaults to the repo owner; only override if the repo is org-owned and the real maintainers are individuals.
  • tags — propose based on the eval's domain (e.g. Coding, games, tools). The upstream repo name is added automatically, so don't include it.

Use register/example_eval.yaml to determine what additional questions are needed.

Show the user the drafted YAML for confirmation before writing the file. Do not set id — it is auto-injected from the directory name.

3. Run validation

make check

This validates the eval.yaml and auto-generates a README.md next to it. The README is fully generated from eval.yaml — do not edit it by hand.

Because the generated page defers to upstream for details, make sure the upstream repo's README covers the dataset, scorer, task parameters, and how the eval was validated.

4. Create a changelog fragment

uv run scriv create

5. Open a PR

Use the PR template. The reviewer will ping anyone listed under source.maintainers for acknowledgement before merging.

technical

github
UKGovernmentBEIS/inspect_evals
stars
517
license
MIT
contributors
100
last commit
2026-05-29T04:29:08Z
file
.claude/skills/prepare-submission-workflow/SKILL.md

related