Skill Index

inspect_evals/

generate-asset-actions

community[skill]

Generate asset-actions.yaml from ASSETS.yaml by classifying assets into priority tiers. Use when the user asks to regenerate, update, or refresh the asset actions.

$/plugin install inspect_evals

details

Generate Asset Policy

Regenerate internal/audits/asset-actions.yaml and internal/audits/audit-summary.md from ASSETS.yaml.

If ASSETS.yaml may be stale, run uv run python tools/generate_asset_manifest.py first.

Run uv run python tools/summarise_asset_manifest.py to get aggregate counts (by type, by state, totals). Use these numbers when populating audit-summary.md.

Classification

Read ASSETS.yaml. For each asset, determine target stage first, then priority. Process both state: floating assets AND state: pinned assets that match known-unstable sources (since their target is controlled, they are not yet at their target stage).

Target stages (per ADR-0007)

The target stage depends on host reliability, not asset type:

  • controlled (Stage 2) — any asset where upstream has broken before, maintainer is unresponsive/deprecated, OR host is unreliable (personal repos, Google Drive, .edu domains, university servers, any host without version control). This applies to git_clone, direct_url, and huggingface alike.
  • pinned (Stage 1) — assets on reliable, version-controlled hosts (GitHub, HuggingFace, well-known CDNs) with no history of breakage.

Per ADR-0007: "Anything hosted on a less reliable domain (personal websites, Google Drive, university servers, or any host without version control) should skip straight to Stage 2."

Priority tiers

  1. Urgent — all other floating refs on reliable hosts. Target is pinned.
  2. High — matches a known-unstable source (see registry below). Target is controlled.
  3. Medium — unreliable host (drive.google.com, .edu domains, personal repos/websites) not already in the known-unstable registry. Target is controlled.

For assets with state: pinned and a {SHA} placeholder but no checksum, classify as Low (target: pinned with checksum).

Omit assets already at their target stage.

Every entry needs: eval, source, type, state, target, action, reason.

Known-Unstable Sources

Update this list when new instability is discovered.

SourceEvalIncident
xlang-ai/OSWorldosworldFiles removed (PR #958)
openai/evalsmakemesayDeprecated upstream
corebench.cs.princeton.educore_benchUniversity server, no versioning
epatey/fontsosworldPersonal repo
ShishirPatil/gorillabfclData format issues (PR #954)
yunx-z/MLRC-Benchmlrc_benchBroken task
LRudL/sadsadUpstream bugs (issues #7, #8)
meg-tong/sycophancy-evalsycophancyInvalid JSON/NaN, workaround in code
josancamon/paperbenchpaperbenchPaper ID mismatch (HF discussion #2)
sentientfutures/moru-benchmarkmoruExact duplicate rows

Verification

  1. asset-actions.yaml parses as valid YAML
  2. Every floating asset in ASSETS.yaml appears in urgent, high, or medium
  3. floating_assets + needing_checksums + no_action_needed == total_external_assets
  4. Numbers in audit-summary.md match output of summarise_asset_manifest.py

technical

github
UKGovernmentBEIS/inspect_evals
stars
517
license
MIT
contributors
100
last commit
2026-05-29T04:29:08Z
file
.claude/skills/generate-asset-actions/SKILL.md

related