Skill Index

ai-asset-pricing/

bond-data

community[skill]

Reference for the Dickerson corporate bond dataset: column mappings between WRDS and PyBondLab, rating encoding, return definitions, signal clusters, and data gotchas. Auto-apply when loading bond data for PyBondLab analysis.

$/plugin install ai-asset-pricing

details

Bond Data Reference

The Dickerson cleaned TRACE corporate bond panel (141 columns, 2.83M rows, 1973-01 to 2025-03). Fetch via bonds-wrds-expert, stored in data/ as Parquet.

PyBondLab Column Mapping

Parquet ColumnPBL NameDescription
cusipID9-digit bond CUSIP
ret_vwretMonth-end total return (primary)
mcap_eVWBond market cap at month-end
spc_ratRATING_NUMS&P composite rating (1-22)
permnoPERMNOCRSP equity link (for WithinFirmSort)

Three Mapping Approaches

# Option 1: fit() params (StrategyFormation)
result = sf.fit(IDvar='cusip', RETvar='ret_vw', VWvar='mcap_e', RATINGvar='spc_rat')

# Option 2: rename upfront
data = data.rename(columns={'cusip': 'ID', 'ret_vw': 'ret', 'mcap_e': 'VW', 'spc_rat': 'RATING_NUM'})

# Option 3: Batch columns= dict (pbl_name -> user_name)
batch = pbl.BatchStrategyFormation(
    data=data,
    columns={'ID': 'cusip', 'ret': 'ret_vw', 'VW': 'mcap_e', 'RATING_NUM': 'spc_rat'},
    ...
)

Real Data Prep

data['spc_rat'] = data['spc_rat'].astype('float64')  # nullable IntegerArray breaks numba

Alternate Return Columns

ColumnWhen to Use
ret_vwDefault. Use with MMN-adjusted signals
ret_vw_bgnUse ONLY with noisy/unadjusted signals (_mmn suffix)
ret_vwxExcess return (ret_vw minus duration-matched Treasury)

VW Column: mcap_e vs mcap_s

ColumnDefinitionWhen to Use
mcap_eMarket cap at end of month tDefault. Contemporary with signal
mcap_sMarket cap at end of month t-1Lagged market cap

Rating Encoding

spc_rat: S&P composite rating (S&P first, Moody's fallback).

NumericRatingGrade
1AAAIG
2-4AA+/AA/AA-IG
5-7A+/A/A-IG
8-10BBB+/BBB/BBB-IG
11-13BB+/BB/BB-HY
14-16B+/B/B-HY
17-19CCC+/CCC/CCC-HY
20-22CC/C/DHY/Default

PyBondLab filters: rating='IG' → spc_rat ≤ 10, rating='NIG' → spc_rat > 10.

Signal Clusters

141 columns in 9 clusters:

ClusterKey SignalsCount
Identifiers & Returnscusip, date, ret_vw, ret_type, rfret22
Bond Characteristicsspc_rat, call, fce_val, 144a8
Sizemcap_s, mcap_e, sze3
Spreads & Durationcs, md_dur, tmat, ytm, convx, age8
Valuebbtm, val_hz, val_ipr5
Momentum & Reversalmom3_1, mom6_1, mom12_1, str, ltr*21
Illiquiditypi, ami, roll, spd_abs, spd_rel13
Volatility & Riskdvol, rvol, ivol_*, var_95, es_9016
Factor Betasb_mktb, b_dvix, b_defb, b_psb41

Common test signals: cs, tmat, mom3_1, mom6_1, mom12_1, bbtm, md_dur, dvol, ami, var_95

Data Subsets

SubsetFilterRowsUse Case
FullNone2.83MIncludes pre-TRACE (1973+)
TRACE-onlydate >= '2002-08-01'1.86MRecommended
IGspc_rat <= 10~1.6MInvestment grade
HYspc_rat > 10~1.2MHigh yield

Gotchas

  1. Multiple bonds per issuer — one row per tranche per month. Aggregate by issuer_cusip or permno for firm-level.
  2. cs_sprd ≠ credit spread — it's Corwin-Schultz high-low spread (liquidity). Actual credit spread is cs.
  3. ar_sprd ≠ adjusted return spread — it's Abdi-Ranaldo closing price spread (liquidity).
  4. No lead/lag — all variables sampled at end of month t.
  5. tret NULL on recent dates — use ret_vw - rfret for excess returns instead.
  6. 19% of bonds lack permno (11% in TRACE-only) — dropped in WithinFirmSort.
  7. 144a column name — starts with digit, use data['144a'].
  8. ret_type valuesstandard, default_evnt, trad_in_def. Filter ret_type == 'standard' to exclude defaults.
  9. Composite ratingsspc_rat uses S&P first, Moody's fallback. Not pure S&P.
  10. lib/libd = Latent Implementation Bias — NOT LIBOR.
  11. Pre-TRACE era (before 2002-08) — lower quality. Use TRACE-only subset.
  12. Rolling beta start dates — betas use 36-month windows, start ~2003-08.

technical

github
Alexander-M-Dickerson/ai-asset-pricing
stars
49
license
MIT
contributors
1
last commit
2026-04-19T07:58:01Z
file
.claude/skills/bond-data/SKILL.md

related