Anonymous View
Skip to content

test(harness): shared test fakes + conformance determinism fix#427

Merged
declan-scale merged 2 commits into
nextfrom
declan-scale/harness-test-fakes
Jun 23, 2026
Merged

test(harness): shared test fakes + conformance determinism fix#427
declan-scale merged 2 commits into
nextfrom
declan-scale/harness-test-fakes

Conversation

@declan-scale

@declan-scale declan-scale commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Summary

First slice of #425 (post-merge harness cleanup), scoped to non-breaking test infrastructure so it can land independently.

  • Shared test fakes — extract tests/lib/core/harness/_fakes.py (FakeSpan/FakeTracing), removing ~9 duplicated in-file copies.
  • Conformance determinism fix (Greptile P1)test_span_derivation_is_deterministic now iterates all_fixtures() at runtime, paired with a new conformance conftest.py that eagerly imports every per-harness conformance module so the fixture set is fully populated regardless of collection order. Previously import-time parametrization silently dropped per-harness coverage.
  • Removes the now-landed pr4-pydantic-ai planning doc.

Test plan

  • pytest tests/lib/core/harness/ — 122 passed, 1 skipped

Notes

Part of a stacked split of #425. No source changes; safe to merge first.

🤖 Generated with Claude Code

Greptile Summary

This PR is a pure test-infrastructure cleanup with no source code changes. It extracts shared FakeSpan/FakeTracing test doubles into tests/lib/core/harness/_fakes.py, eliminating roughly nine in-file duplicate definitions. It also fixes a conformance determinism gap: test_span_derivation_is_deterministic previously parametrized on all_fixtures() at import time (missing any harness whose module hadn't been collected yet) and now calls all_fixtures() inside the test body, backed by a new conftest.py that eagerly imports every per-harness conformance module so registration is always complete.

  • Shared fakesFakeSpan / FakeTracing extracted to _fakes.py; all local _FakeTracing / _RecordingTracing / _FakeSpan variants replaced with the shared versions throughout test_tracer.py, test_emitter.py, test_auto_send.py, test_yield_delivery.py, and runner.py.
  • Determinism fixtest_conformance.py::test_span_derivation_is_deterministic is now a single no-parametrize test that calls all_fixtures() at run time, with a guard asserting that per-harness fixtures were registered; conftest.py ensures all conformance modules are imported before any test executes.
  • asyncio.run() at import removedtest_pydantic_ai_conformance.py and test_codex_conformance.py now use the shared run_pure_async loop-free driver; the driver itself is promoted from a private copy in test_claude_code_conformance.py to a public export in runner.py.

Confidence Score: 5/5

Safe to merge — no source code changes, only test infrastructure refactoring with a determinism bug fix.

All changes are confined to test files. The shared FakeTracing is a behavioral superset of every local copy it replaces (same started/ended tuple format, same span-object shape). The run_pure_async driver is sound for pure in-memory coroutines and was already proven correct by the existing claude_code_conformance suite. The conftest eager-import strategy is safe because Python's module cache prevents double-registration. No edge cases that could silently hide failures were found.

No files require special attention.

Important Files Changed

Filename Overview
tests/lib/core/harness/_fakes.py New shared test-doubles file; FakeSpan/FakeTracing match the superset of all previously-local variants and expose started_names/started_pairs/ended_outputs convenience properties for backward-compatible assertions.
tests/lib/core/harness/conformance/conftest.py New conftest eagerly imports all five per-harness conformance modules for registration side-effects; test_openai_conformance.py confirmed to exist; no asyncio.run() in any imported module.
tests/lib/core/harness/conformance/runner.py Removes local _FakeTracing (replaced by shared FakeTracing), promotes run_pure_async from test_claude_code_conformance.py to a public export, and cleans up stale ticket-reference comments.
tests/lib/core/harness/conformance/test_conformance.py test_span_derivation_is_deterministic converted from parametrize-at-import to a runtime loop with a guard ensuring per-harness fixtures are registered; sound fix for the collection-order determinism gap.
tests/lib/core/harness/conformance/test_pydantic_ai_conformance.py asyncio.run() replaced with run_pure_async for all four module-level fixture builds; per-harness determinism test removed (consolidated into test_conformance.py).
tests/lib/core/harness/conformance/test_codex_conformance.py asyncio.run() in _build() replaced with run_pure_async; local span-derivation test removed and covered by the consolidated test in test_conformance.py.
tests/lib/core/harness/test_tracer.py _FakeSpan/_FakeTracing removed; FakeTracing imported from _fakes.py; shared implementation is an exact behavioral match to the removed local version.
tests/lib/core/harness/test_auto_send.py Local _RecordTracing removed; assertions updated from .started/.ended to .started_names/.ended_outputs to match the richer shared FakeTracing tuple storage.
tests/lib/core/harness/test_yield_delivery.py Local _RecordTracing removed; assertions updated to started_names/ended_outputs; otherwise identical behavior.

Reviews (5): Last reviewed commit: "test(harness): build conformance fixture..." | Re-trigger Greptile

@declan-scale

Copy link
Copy Markdown
Contributor Author

@greptile review

1 similar comment
@declan-scale

Copy link
Copy Markdown
Contributor Author

@greptile review

@danielmillerp danielmillerp self-requested a review June 23, 2026 19:28
…rminism fix

Extract tests/lib/core/harness/_fakes.py (FakeSpan/FakeTracing), removing
~9 duplicated copies, and harden the conformance determinism test: it now
iterates all_fixtures() at runtime (paired with a conftest that eagerly
imports every per-harness conformance module) instead of import-time
parametrization, which had silently dropped per-harness coverage (Greptile P1).

Also removes the now-landed pr4-pydantic-ai planning doc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@declan-scale declan-scale force-pushed the declan-scale/harness-test-fakes branch from 5953df6 to a84f83b Compare June 23, 2026 19:34
Comment thread tests/lib/core/harness/conformance/conftest.py
Address review: the codex and pydantic-ai conformance modules called
asyncio.run() at import time, which raises RuntimeError when collected under an
already-running event loop (programmatic pytest, notebooks) — so even a focused
run could fail during collection. Hoist the loop-free driver from the
claude-code module into runner.run_pure_async and use it everywhere fixtures are
built at import time.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@declan-scale declan-scale merged commit 7f6d70a into next Jun 23, 2026
57 checks passed
@declan-scale declan-scale deleted the declan-scale/harness-test-fakes branch June 23, 2026 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants