Anonymous View
Skip to content

refactor(routes): encapsulate standalone-vs-platform mode behind a strategy#127

Draft
njbrake wants to merge 2 commits into
mainfrom
feat/request-mode-strategy
Draft

refactor(routes): encapsulate standalone-vs-platform mode behind a strategy#127
njbrake wants to merge 2 commits into
mainfrom
feat/request-mode-strategy

Conversation

@njbrake

@njbrake njbrake commented Jun 9, 2026

Copy link
Copy Markdown
Member

Note: this PR description was drafted by Claude via back-and-forth with @njbrake. The reasoning and decisions are his; the prose is Claude's.

Summary

Fixes #102. The chat, messages, and responses handlers each read config.is_platform_mode and re-branched on it inline at many sites (chat 15, responses 13, messages 11), interleaving the two modes' credential-resolution and usage-reporting logic in one long function body. A bug had to be fixed in three places and it was easy to add a path that silently worked in only one mode.

  • New module src/gateway/api/routes/_mode_strategy.py with two seams, selected once per request:
    • RequestModeStrategy (StandaloneStrategy / PlatformStrategy): resolve() runs the platform resolve call or the auth plus budget-reservation flow and stashes the per-request state; resolve_mcp_servers() replaces the mcp_server_ids mode branch; make_settlement() builds the right settlement.
    • RequestSettlement (StandaloneSettlement / PlatformSettlement): supplies the on_success / on_error / on_no_usage / on_incomplete hooks the streaming generators already accept, replacing the inline if platform_mode branches in the streaming callbacks and the standalone non-streaming tail.
  • The three handlers shed about 529 net lines; the top resolve block becomes strategy.resolve(spec=...).
  • Dispatch mechanics (the run_platform_attempts runner vs the standalone single call, the streaming fallback-vs-single split) stay in the handlers, as the issue permits.

Scope notes

  • Lighter seam, not full convergence onto run_platform_attempts. Standalone keeps its own dispatch because get_provider_kwargs returns richer credentials (client_args, api_version, Vertex env) than ResolvedAttempt models, so routing it through the attempt-based runner would silently drop them.
  • No behavior change on the wire for the refactor itself: status codes, headers, and budget reconcile/refund semantics are preserved.

Bundled bug fix: streaming pre-commit reservation leak

Investigating the seam surfaced a pre-existing standalone bug (introduced when the reservation system landed, not by this refactor). On /v1/messages and /v1/responses, a streaming request whose provider call failed before the first chunk logged the error but did not refund the budget reservation, while /v1/chat/completions and all non-streaming paths did. The leaked hold was never released (a budget-period reset zeroes spend but not reserved), so repeated pre-commit streaming failures accumulated reserved until the user hit a spurious "budget exceeded" 403 with low actual spend.

The fix routes the messages/responses pre-commit path through settlement.on_error (log plus refund), matching chat and the non-streaming handlers, and removes the now-dead on_provider_error_precommit hook. In platform mode this also reports an error outcome upstream on a pre-commit failure, consistent with how mid-stream errors already report. The HTTP status and body are unchanged; only users.reserved is released instead of leaked.

Test plan

  • make lint (ruff) and make typecheck (mypy strict, 154 files)
  • New tests/unit/test_mode_strategy.py: strategy selection, both settlements, no-DB reject paths
  • New tests/integration/test_mode_strategy_e2e.py (11 tests): standalone reconcile/refund settlement, the now-uniform pre-commit refund across all three endpoints, mcp_server_ids resolution, and platform request-id/correlation headers plus upstream usage reporting
  • Full suite green: 749 passed, 10 env-gated skips, including the platform-mode and standalone chat/messages/responses suites unchanged
  • uv run python scripts/generate_openapi.py --check (no schema change)

🤖 Generated with Claude Code

…rategy

The chat, messages, and responses handlers each consumed
config.is_platform_mode and re-branched on it inline at many sites,
interleaving the two modes' credential-resolution and usage-reporting logic in
one long function body. A bug had to be fixed in three places and it was easy
to add a path that silently worked in only one mode.

Introduce a RequestModeStrategy seam (src/gateway/api/routes/_mode_strategy.py)
selected once per request:

- resolve() runs the platform resolve call (platform) or auth plus budget
  reservation (standalone) and stashes the per-request state the handler reads.
- A RequestSettlement object supplies the usage-settlement hooks the streaming
  generators already accept, replacing the inline "if platform_mode" branches
  in the streaming callbacks and the standalone non-streaming tail.

Dispatch mechanics (the run_platform_attempts runner vs the standalone single
call, the streaming fallback-vs-single split) stay in the handlers; only
credential resolution and usage settlement move behind the seam. Standalone
keeps its own dispatch because its provider kwargs are richer than
ResolvedAttempt can model, so routing it through the attempt-based runner would
silently drop credentials.

No behavior change on the wire: status codes, headers, and budget
reconcile/refund semantics are preserved, including the messages/responses
pre-commit log-without-refund path. Adds tests/unit/test_mode_strategy.py
covering each strategy and settlement in isolation; the existing standalone and
platform integration suites pass unchanged.

Fixes #102

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@njbrake njbrake temporarily deployed to integration-tests June 9, 2026 16:25 — with GitHub Actions Inactive
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d7a1b221-14fb-4734-92d3-44385c317840

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/request-mode-strategy
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/request-mode-strategy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…errors

Standalone /v1/messages and /v1/responses streaming requests whose provider
call failed before the first chunk logged the error but did not refund the
budget reservation, while /v1/chat/completions and all non-streaming paths did.
The leaked hold was never released (a budget-period reset zeroes spend but not
reserved), so repeated pre-commit streaming failures accumulated reserved until
the user hit a spurious "budget exceeded" 403 with low actual spend. The
asymmetry predates the strategy refactor; this makes the behavior uniform.

Route the messages/responses pre-commit except through settlement.on_error
(log plus refund), matching chat and the non-streaming handlers, and remove the
now-dead on_provider_error_precommit hook. In platform mode this also reports
an error outcome upstream on a pre-commit failure, consistent with how
mid-stream errors already report. No wire change: the HTTP status and body are
unchanged; only users.reserved is released instead of leaked.

Add tests/integration/test_mode_strategy_e2e.py exercising the strategy seam
end-to-end in both modes: standalone reconcile/refund settlement, the
now-uniform pre-commit refund across all three endpoints, mcp_server_ids
resolution, and the platform request-id/correlation headers plus upstream usage
reporting.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Encapsulate standalone-vs-platform mode behind a strategy instead of inline branches in every handler

1 participant