Skill Index

inspect_evals/

investigate-dataset

community[skill]

Investigate datasets from HuggingFace, CSV, or JSON files to understand their structure, fields, and data quality. Trigger whenever you need to explore or inspect a dataset yourself without using pre-written scripts.

$/plugin install inspect_evals

details

Investigate Dataset

This workflow helps you explore and understand datasets used in evaluations. It covers HuggingFace datasets, CSV files, and JSON/JSONL files.

Key Concepts

For detailed information on Inspect's dataset types (datasets.Dataset vs inspect_ai.dataset.Dataset), the hf_dataset() pipeline, caching behaviour, and test utilities, see references/inspect-dataset-patterns.md.

Common Patterns in Evals

Evals typically define:

  • DATASET_PATH: HuggingFace repo path (e.g., "qiaojin/PubMedQA")
  • DATASET_REVISION: Optional git revision/tag for reproducibility
  • record_to_sample(): Function converting raw records to Sample objects

Prerequisites

  • Access to the evaluation code to find dataset configuration
  • Python environment with datasets, pandas, and inspect_ai installed

Steps

1. Identify the Dataset Source

Look for these patterns in the evaluation code:

# HuggingFace dataset
DATASET_PATH = "org/dataset-name"
DATASET_REVISION = "v1.0"  # optional
hf_dataset(path=DATASET_PATH, name="subset", split="train", ...)

# CSV dataset
csv_dataset("path/to/file.csv", ...)
load_csv_dataset("https://example.com/file.csv", eval_name="myeval", ...)

# JSON/JSONL dataset
json_dataset("path/to/file.json", ...)
load_json_dataset("https://example.com/file.jsonl", eval_name="myeval", ...)

2. Load the Raw Dataset

For investigation, load the raw data directly (not through Inspect's sample_fields transformation). Use standard datasets.load_dataset() for HuggingFace, pd.read_csv() for CSV, or pd.read_json() for JSON/JSONL. For gated datasets, ensure HF_TOKEN is set or run huggingface-cli login.

3. Explore Structure and Quality

Use standard pandas/datasets methods to explore:

  • Schema: ds.features (HF) or df.dtypes (pandas)
  • Shape: len(ds), ds.column_names (HF) or df.info(), df.columns (pandas)
  • Sample data: ds[:3] (HF) or df.head() (pandas)
  • Missing values: Check for None, empty strings, empty lists
  • Duplicates: Check ID uniqueness if an ID field exists
  • Value distributions: value_counts() for categorical columns, length stats for text fields

For converting an Inspect Dataset (which has no .to_pandas()) to a DataFrame, see references/inspect-dataset-patterns.md.

4. Understand the Sample Conversion

Look at the record_to_sample function to understand how raw data maps to Inspect samples. Key questions:

  • Which fields become input? Are they combined/formatted?
  • What is the target format? (letter, text, JSON, etc.)
  • Are there choices for multiple choice?
  • What goes into metadata?
  • Are any records filtered out?

5. Test the Inspect Loading Pipeline

See references/inspect-dataset-patterns.md for the pattern to load through Inspect's hf_dataset() and verify sample conversion works correctly.

Quick Reference Commands

# View HF dataset info without downloading
uv run python -c "from datasets import load_dataset_builder; b = load_dataset_builder('org/name'); print(b.info)"

# List available configs/subsets
uv run python -c "from datasets import get_dataset_config_names; print(get_dataset_config_names('org/name'))"

# List available splits
uv run python -c "from datasets import load_dataset; print(load_dataset('org/name', split=None).keys())"

Caching and Troubleshooting

For cache locations (HuggingFace native, Inspect AI, Inspect Evals), force re-download commands, and test utilities, see references/inspect-dataset-patterns.md.

  • Gated dataset: Run huggingface-cli login or set HF_TOKEN
  • Rate limited: The hf_dataset wrapper in inspect_evals.utils.huggingface has built-in retry with backoff
  • Large dataset: Use streaming=True or split="train[:1000]" for sampling
  • Missing revision: Check the dataset's "Files and versions" tab on HuggingFace

technical

github
UKGovernmentBEIS/inspect_evals
stars
517
license
MIT
contributors
100
last commit
2026-05-29T04:29:08Z
file
.claude/skills/investigate-dataset/SKILL.md

related