Skill Index

relay/

relay-80-100-workflow

community[skill]

Use when writing agent-relay workflows that must fully validate features end-to-end before merging. Covers the 80-to-100 pattern - going beyond "code compiles" to "feature works, tested E2E locally." Includes PGlite for in-memory Postgres testing, mock sandbox patterns, test-fix-rerun loops, verify gates after every edit, and the full lifecycle from implementation through passing tests to commit.

$/plugin install relay

details

Writing 80-to-100 Validated Workflows

Overview

Most agent workflows get features to ~80%: code written, types check, maybe a build passes. This skill covers the 80-to-100 gap — making workflows that fully validate features end-to-end before committing. The goal: every feature merged via these workflows is tested, verified, and known-working, not just "it compiles."

When to Use

  • Writing workflows where the deliverable must be production-ready, not just code-complete
  • Features that touch databases, APIs, or infrastructure that can be tested locally
  • Any workflow where "it compiles" is not sufficient proof of correctness
  • When you want confidence that the commit actually works before deploying

Core Principle: Test In The Workflow

The key insight: run tests as deterministic steps inside the workflow itself. Don't just write test files — execute them, verify they pass, fix failures, and re-run. The workflow doesn't commit until tests are green.

implement → write tests → run tests → fix failures → re-run → build check → regression check → commit

This means the commit at the end of the workflow represents code that is proven working, not just code that an agent wrote and claimed works.

The Test-Fix-Rerun Pattern

Every testable feature in a workflow should follow this three-step pattern:

// Step 1: Run tests (allow failure — we expect issues on first run)
.step('run-tests', {
  type: 'deterministic',
  dependsOn: ['create-tests'],
  command: 'npx tsx --test tests/my-feature.test.ts 2>&1 | tail -60',
  captureOutput: true,
  failOnError: false,  // <-- Don't fail the workflow, let the agent fix it
})

// Step 2: Agent reads output, fixes issues, re-runs until green
.step('fix-tests', {
  agent: 'tester',
  dependsOn: ['run-tests'],
  task: `Check the test output and fix any failures.

Test output:
{{steps.run-tests.output}}

If all tests passed, do nothing.
If there are failures:
1. Read the failing test file and source files
2. Fix the issues (could be in test or source)
3. Re-run: npx tsx --test tests/my-feature.test.ts
4. Keep fixing until ALL tests pass.`,
  verification: { type: 'exit_code' },
})

// Step 3: Deterministic final run — this one MUST pass
.step('run-tests-final', {
  type: 'deterministic',
  dependsOn: ['fix-tests'],
  command: 'npx tsx --test tests/my-feature.test.ts 2>&1',
  captureOutput: true,
  failOnError: true,  // <-- Hard fail if tests still broken
})

Why three steps instead of one?

  • The first run captures output for the agent to diagnose
  • The agent step can iterate (read errors, fix, re-run) multiple times
  • The final deterministic run is the gate — no agent judgment, just pass/fail

PGlite: In-Memory Postgres for Database Testing

When your feature touches the database, use PGlite — a WASM-based Postgres that runs in-process. No Docker, no external services, no flaky network dependencies.

Setup

Install as a dev dependency in the workflow:

.step('install-pglite', {
  type: 'deterministic',
  command: 'npm install --save-dev @electric-sql/pglite 2>&1 | tail -5',
  captureOutput: true,
})

Test Helper Pattern

Create a reusable helper that boots an in-memory Postgres with your schema:

// tests/helpers/pglite-db.ts
import { PGlite } from '@electric-sql/pglite';
import { drizzle } from 'drizzle-orm/pglite';
import * as schema from '../../packages/web/lib/db/schema.js';

// Raw DDL matching your Drizzle schema — PGlite doesn't run Drizzle migrations
const MY_TABLE_DDL = `
CREATE TABLE IF NOT EXISTS my_table (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  name TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
`;

export async function createTestDb() {
  const pg = new PGlite();
  await pg.exec(MY_TABLE_DDL);
  const db = drizzle(pg, { schema });
  return { db, pg, schema, cleanup: () => pg.close() };
}

PGlite Gotchas

IssueFix
pgcrypto extension not availableUse gen_random_uuid() (built-in since PG 13) or generate UUIDs in app code
UUID columnsPGlite supports UUID natively — no special handling needed
drizzle-orm/pglite importExists since drizzle-orm 0.30+. If not found, check version.
Index creationPGlite supports standard CREATE INDEX — no limitations
Concurrent writesPGlite is single-connection. Test concurrent logic with sequential assertions.

Test Structure

// tests/my-feature.test.ts
import { describe, it } from 'node:test';
import assert from 'node:assert/strict';
import { randomUUID } from 'node:crypto';
import { createTestDb } from './helpers/pglite-db.js';

describe('my feature', () => {
  it('does the thing correctly', async () => {
    const { db, schema, cleanup } = await createTestDb();
    try {
      // Arrange
      const testId = randomUUID();
      // Act — use your module against the real (in-memory) Postgres
      // Assert
      assert.equal(result.name, 'expected');
    } finally {
      await cleanup();
    }
  });
});

Verify Gates After Every Edit

Never trust that an agent edited a file correctly. Add a deterministic verify gate after every agent edit step:

// Agent edits a file
.step('edit-schema', {
  agent: 'impl',
  dependsOn: ['read-schema'],
  task: `Edit packages/web/lib/db/schema.ts...`,
  verification: { type: 'exit_code' },
})

// Deterministic verification — did the edit actually land?
.step('verify-schema', {
  type: 'deterministic',
  dependsOn: ['edit-schema'],
  command: `if git diff --quiet packages/web/lib/db/schema.ts; then echo "NOT MODIFIED"; exit 1; fi
grep "my_new_table" packages/web/lib/db/schema.ts >/dev/null && echo "OK" || (echo "MISSING"; exit 1)`,
  failOnError: true,
  captureOutput: true,
})

What to verify:

  • File was actually modified (git diff --quiet returns non-zero)
  • Key content exists (grep for table names, function names, imports)
  • For new files: file_exists verification type

What NOT to verify:

  • Exact content (too brittle — agents format differently)
  • Line counts or byte sizes (meaningless)

Mock Sandbox Pattern

When testing code that interacts with Daytona sandboxes, use inline mock objects matching the existing test conventions:

const daytona = {
  create: async () => ({
    id: 'sandbox-id',
    process: {
      executeCommand: async (cmd, cwd, env) => ({
        result: 'output',
        exitCode: 0,
      }),
    },
    fs: {
      uploadFile: async () => undefined,
    },
    getUserHomeDir: async () => '/home/daytona',
  }),
  remove: async () => undefined,
};

For testing that your code calls the right methods, record calls in an array:

const emitted: EmitEventOptions[] = [];
const mockClient: SessionEventClient = {
  emit: async (opts) => {
    emitted.push(opts);
  },
  getEvents: async () => [],
  getLatestSequence: async () => 0,
};

// ... run the code ...

assert.equal(emitted.length, 4);
assert.equal(emitted[0].eventType, 'sandbox_created');

Regression Testing

After your new tests pass, always run the existing test suite to catch regressions:

.step('run-existing-tests', {
  type: 'deterministic',
  dependsOn: ['fix-build'],
  command: 'npm run orchestrator:test 2>&1 | tail -40',
  captureOutput: true,
  failOnError: false,
})

.step('fix-regressions', {
  agent: 'impl',
  dependsOn: ['run-existing-tests'],
  task: `Check the full test suite for regressions caused by our changes.

Test output:
{{steps.run-existing-tests.output}}

If all tests passed, do nothing.
If EXISTING tests broke, read the failing test, find what we broke, fix it.
Most likely cause: constructor signatures changed, new required fields added
without defaults, or import paths shifted.

Run: npm run orchestrator:test
Fix until all tests pass.`,
  verification: { type: 'exit_code' },
})

Full Workflow Template

Here's the complete pattern for a feature that touches the database:

import { workflow } from '@agent-relay/sdk/workflows';

const result = await workflow('my-feature')
  .description('Add feature X with full E2E validation')
  .pattern('dag')
  .channel('wf-my-feature')
  .maxConcurrency(3)
  .timeout(3_600_000)

  .agent('impl', { cli: 'claude', preset: 'worker', retries: 2 })
  .agent('tester', { cli: 'claude', preset: 'worker', retries: 2 })

  // ── Phase 1: Read ────────────────────────────────────────────────
  .step('read-target', {
    type: 'deterministic',
    command: 'cat path/to/file.ts',
    captureOutput: true,
  })

  // ── Phase 2: Implement ───────────────────────────────────────────
  .step('edit-target', {
    agent: 'impl',
    dependsOn: ['read-target'],
    task: `Edit path/to/file.ts. Current contents:
{{steps.read-target.output}}
<specific instructions>
Only edit this one file.`,
    verification: { type: 'exit_code' },
  })
  .step('verify-target', {
    type: 'deterministic',
    dependsOn: ['edit-target'],
    command: 'git diff --quiet path/to/file.ts && (echo "NOT MODIFIED"; exit 1) || echo "OK"',
    failOnError: true,
    captureOutput: true,
  })

  // ── Phase 3: Test infrastructure ─────────────────────────────────
  .step('install-pglite', {
    type: 'deterministic',
    command: 'npm install --save-dev @electric-sql/pglite 2>&1 | tail -5',
    captureOutput: true,
  })
  .step('create-test-helpers', {
    agent: 'tester',
    dependsOn: ['install-pglite'],
    task: 'Create tests/helpers/pglite-db.ts with <DDL for your tables>...',
    verification: { type: 'file_exists', value: 'tests/helpers/pglite-db.ts' },
  })
  .step('create-tests', {
    agent: 'tester',
    dependsOn: ['create-test-helpers', 'verify-target'],
    task: 'Create tests/my-feature.test.ts with <test descriptions>...',
    verification: { type: 'file_exists', value: 'tests/my-feature.test.ts' },
  })

  // ── Phase 4: Test-fix-rerun loop ─────────────────────────────────
  .step('run-tests', {
    type: 'deterministic',
    dependsOn: ['create-tests'],
    command: 'npx tsx --test tests/my-feature.test.ts 2>&1 | tail -60',
    captureOutput: true,
    failOnError: false,
  })
  .step('fix-tests', {
    agent: 'tester',
    dependsOn: ['run-tests'],
    task: `Fix any test failures. Output:\n{{steps.run-tests.output}}`,
    verification: { type: 'exit_code' },
  })
  .step('run-tests-final', {
    type: 'deterministic',
    dependsOn: ['fix-tests'],
    command: 'npx tsx --test tests/my-feature.test.ts 2>&1',
    captureOutput: true,
    failOnError: true,
  })

  // ── Phase 5: Build + regression ──────────────────────────────────
  .step('build-check', {
    type: 'deterministic',
    dependsOn: ['run-tests-final'],
    command: 'npx tsc --noEmit 2>&1 | tail -20; echo "EXIT: $?"',
    captureOutput: true,
    failOnError: false,
  })
  .step('fix-build', {
    agent: 'impl',
    dependsOn: ['build-check'],
    task: `Fix type errors if any. Output:\n{{steps.build-check.output}}`,
    verification: { type: 'exit_code' },
  })
  .step('run-existing-tests', {
    type: 'deterministic',
    dependsOn: ['fix-build'],
    command: 'npm test 2>&1 | tail -40',
    captureOutput: true,
    failOnError: false,
  })
  .step('fix-regressions', {
    agent: 'impl',
    dependsOn: ['run-existing-tests'],
    task: `Fix regressions if any. Output:\n{{steps.run-existing-tests.output}}`,
    verification: { type: 'exit_code' },
  })

  // ── Phase 6: Commit ──────────────────────────────────────────────
  .step('commit', {
    type: 'deterministic',
    dependsOn: ['fix-regressions'],
    command: 'git add <files> && git commit -m "feat: ..."',
    captureOutput: true,
    failOnError: true,
  })

  .onError('retry', { maxRetries: 2, retryDelayMs: 10_000 })
  .run({ cwd: process.cwd() });

Checklist: Is Your Workflow 80-to-100?

CheckHow
Tests existfile_exists verification on test file
Tests actually runDeterministic step executes them
Test failures get fixedAgent step reads output, fixes, re-runs
Final test run is hard-gatedfailOnError: true on last test step
Build passesnpx tsc --noEmit deterministic step
No regressionsExisting test suite runs after changes
Every edit is verifiedgit diff --quiet + grep after each agent edit
Commit only happens after all gatesdependsOn chains to final verification

Common Anti-Patterns

Anti-patternWhy it failsFix
Tests written but never executedAgent claims they pass, they don'tAdd deterministic run-tests step
Single failOnError: true test runFirst failure kills workflow, no chance to fixUse the three-step test-fix-rerun pattern
No regression testNew feature works, old features breakRun npm test after build check
Agent asked to "write and run tests" in one stepAgent writes tests, runs them, they fail, it edits, output is garbledSeparate write/run/fix into distinct steps
PGlite DDL doesn't match Drizzle schemaTests pass on wrong schemaDerive DDL from schema.ts or test with real migration
failOnError: false on final test runBroken tests get committedAlways failOnError: true on the gate step
Testing only happy pathEdge cases break in prodSpecify edge case tests in the task prompt
No verify gate after agent editsAgent exits 0 without writing anythingAdd git diff --quiet check after every edit

technical

github
AgentWorkforce/relay
stars
628
license
Apache-2.0
contributors
14
last commit
2026-04-21T02:25:55Z
file
.claude/skills/relay-80-100-workflow/SKILL.md

related