inspect_evals/

investigate-dataset

community[skill]

Investigate datasets from HuggingFace, CSV, or JSON files to understand their structure, fields, and data quality. Trigger whenever you need to explore or inspect a dataset yourself without using pre-written scripts.

$/plugin install inspect_evals

details

Investigate Dataset

This workflow helps you explore and understand datasets used in evaluations. It covers HuggingFace datasets, CSV files, and JSON/JSONL files.

Key Concepts

For detailed information on Inspect's dataset types (datasets.Dataset vs inspect_ai.dataset.Dataset), the hf_dataset() pipeline, caching behaviour, and test utilities, see references/inspect-dataset-patterns.md.

Common Patterns in Evals

Evals typically define:

DATASET_PATH: HuggingFace repo path (e.g., "qiaojin/PubMedQA")
DATASET_REVISION: Optional git revision/tag for reproducibility
record_to_sample(): Function converting raw records to Sample objects

Prerequisites

Access to the evaluation code to find dataset configuration
Python environment with datasets, pandas, and inspect_ai installed

Steps

1. Identify the Dataset Source

Look for these patterns in the evaluation code:

# HuggingFace dataset
DATASET_PATH = "org/dataset-name"
DATASET_REVISION = "v1.0"  # optional
hf_dataset(path=DATASET_PATH, name="subset", split="train", ...)

# CSV dataset
csv_dataset("path/to/file.csv", ...)
load_csv_dataset("https://example.com/file.csv", eval_name="myeval", ...)

# JSON/JSONL dataset
json_dataset("path/to/file.json", ...)
load_json_dataset("https://example.com/file.jsonl", eval_name="myeval", ...)

2. Load the Raw Dataset

For investigation, load the raw data directly (not through Inspect's sample_fields transformation). Use standard datasets.load_dataset() for HuggingFace, pd.read_csv() for CSV, or pd.read_json() for JSON/JSONL. For gated datasets, ensure HF_TOKEN is set or run huggingface-cli login.

3. Explore Structure and Quality

Use standard pandas/datasets methods to explore:

Schema: ds.features (HF) or df.dtypes (pandas)
Shape: len(ds), ds.column_names (HF) or df.info(), df.columns (pandas)
Sample data: ds[:3] (HF) or df.head() (pandas)
Missing values: Check for None, empty strings, empty lists
Duplicates: Check ID uniqueness if an ID field exists
Value distributions: value_counts() for categorical columns, length stats for text fields

For converting an Inspect Dataset (which has no .to_pandas()) to a DataFrame, see references/inspect-dataset-patterns.md.

4. Understand the Sample Conversion

Look at the record_to_sample function to understand how raw data maps to Inspect samples. Key questions:

Which fields become input? Are they combined/formatted?
What is the target format? (letter, text, JSON, etc.)
Are there choices for multiple choice?
What goes into metadata?
Are any records filtered out?

5. Test the Inspect Loading Pipeline

See references/inspect-dataset-patterns.md for the pattern to load through Inspect's hf_dataset() and verify sample conversion works correctly.

Quick Reference Commands

# View HF dataset info without downloading
uv run python -c "from datasets import load_dataset_builder; b = load_dataset_builder('org/name'); print(b.info)"

# List available configs/subsets
uv run python -c "from datasets import get_dataset_config_names; print(get_dataset_config_names('org/name'))"

# List available splits
uv run python -c "from datasets import load_dataset; print(load_dataset('org/name', split=None).keys())"

Caching and Troubleshooting

For cache locations (HuggingFace native, Inspect AI, Inspect Evals), force re-download commands, and test utilities, see references/inspect-dataset-patterns.md.

Gated dataset: Run huggingface-cli login or set HF_TOKEN
Rate limited: The hf_dataset wrapper in inspect_evals.utils.huggingface has built-in retry with backoff
Large dataset: Use streaming=True or split="train[:1000]" for sampling
Missing revision: Check the dataset's "Files and versions" tab on HuggingFace

technical

github: UKGovernmentBEIS/inspect_evals
stars: 517
license: MIT
contributors: 100
last commit: 2026-05-29T04:29:08Z
file: .claude/skills/investigate-dataset/SKILL.md

inspect_evals/read-eval-logs— View and analyse Inspect evaluation log files using the Python API. Trigger whenever you need to look at a .eval file yourself without using pre-written scripts.
Obsidian-CLI-skill/obsidian-cli— Use this skill whenever the user wants Claude to directly interact with their Obsidian vault — reading a note or daily note, writing or appending content, searching vault contents, counting or listing notes, managing tasks, moving or renaming files, finding orphaned notes or broken links. Without this skill, Claude has no way to access vault data or execute vault operations. Treat any request that implies "go into my vault and do X" as a trigger — the user is asking Claude to act, not to explain. Also trigger for vault automation, CLI scripting, or cron-based workflows involving Obsidian, managing sync history, querying Bases, restoring file versions via history, managing bookmarks, or running JavaScript against the Obsidian API. Skip for pure conceptual questions: how Obsidian's GUI works, navigating settings menus, theme or plugin installation via the UI, iCloud/third-party sync conflicts, general Dataview query syntax, keyboard shortcuts, or parsing vault files with external scripts — anything where the user needs an explanation rather than Claude performing a vault operation.
obsidian-cli/obsidian-cli— Use this skill whenever the user wants Claude to directly interact with their Obsidian vault — reading a note or daily note, writing or appending content, searching vault contents, counting or listing notes, managing tasks, moving or renaming files, finding orphaned notes or broken links. Without this skill, Claude has no way to access vault data or execute vault operations. Treat any request that implies "go into my vault and do X" as a trigger — the user is asking Claude to act, not to explain. Also trigger for vault automation, CLI scripting, or cron-based workflows involving Obsidian. Skip for pure conceptual questions: how Obsidian's GUI works, navigating settings menus, theme or plugin installation via the UI, or general Dataview query syntax — anything where the user needs an explanation rather than Claude performing a vault operation.
mngr/writing-docs— Write high quality, user-facing documentation. Use any time you need to write, improve, or update a significant amount of user-facing documentation (e.g., files in a "docs/" folder or README file).
mngr/message-agent— Send a message to another mngr agent. Use when you need to communicate with a peer agent.