refactor(routes): encapsulate standalone-vs-platform mode behind a strategy by njbrake · Pull Request #127 · mozilla-ai/otari

njbrake · 2026-06-09T16:25:14Z

Note: this PR description was drafted by Claude via back-and-forth with @njbrake. The reasoning and decisions are his; the prose is Claude's.

Summary

Fixes #102. The chat, messages, and responses handlers each read config.is_platform_mode and re-branched on it inline at many sites (chat 15, responses 13, messages 11), interleaving the two modes' credential-resolution and usage-reporting logic in one long function body. A bug had to be fixed in three places and it was easy to add a path that silently worked in only one mode.

New module src/gateway/api/routes/_mode_strategy.py with two seams, selected once per request:
- RequestModeStrategy (StandaloneStrategy / PlatformStrategy): resolve() runs the platform resolve call or the auth plus budget-reservation flow and stashes the per-request state; resolve_mcp_servers() replaces the mcp_server_ids mode branch; make_settlement() builds the right settlement.
- RequestSettlement (StandaloneSettlement / PlatformSettlement): supplies the on_success / on_error / on_no_usage / on_incomplete hooks the streaming generators already accept, replacing the inline if platform_mode branches in the streaming callbacks and the standalone non-streaming tail.
The three handlers shed about 529 net lines; the top resolve block becomes strategy.resolve(spec=...).
Dispatch mechanics (the run_platform_attempts runner vs the standalone single call, the streaming fallback-vs-single split) stay in the handlers, as the issue permits.

Scope notes

Lighter seam, not full convergence onto run_platform_attempts. Standalone keeps its own dispatch because get_provider_kwargs returns richer credentials (client_args, api_version, Vertex env) than ResolvedAttempt models, so routing it through the attempt-based runner would silently drop them.
No behavior change on the wire for the refactor itself: status codes, headers, and budget reconcile/refund semantics are preserved.

Bundled bug fix: streaming pre-commit reservation leak

Investigating the seam surfaced a pre-existing standalone bug (introduced when the reservation system landed, not by this refactor). On /v1/messages and /v1/responses, a streaming request whose provider call failed before the first chunk logged the error but did not refund the budget reservation, while /v1/chat/completions and all non-streaming paths did. The leaked hold was never released (a budget-period reset zeroes spend but not reserved), so repeated pre-commit streaming failures accumulated reserved until the user hit a spurious "budget exceeded" 403 with low actual spend.

The fix routes the messages/responses pre-commit path through settlement.on_error (log plus refund), matching chat and the non-streaming handlers, and removes the now-dead on_provider_error_precommit hook. In platform mode this also reports an error outcome upstream on a pre-commit failure, consistent with how mid-stream errors already report. The HTTP status and body are unchanged; only users.reserved is released instead of leaked.

Test plan

make lint (ruff) and make typecheck (mypy strict, 154 files)
New tests/unit/test_mode_strategy.py: strategy selection, both settlements, no-DB reject paths
New tests/integration/test_mode_strategy_e2e.py (11 tests): standalone reconcile/refund settlement, the now-uniform pre-commit refund across all three endpoints, mcp_server_ids resolution, and platform request-id/correlation headers plus upstream usage reporting
Full suite green: 749 passed, 10 env-gated skips, including the platform-mode and standalone chat/messages/responses suites unchanged
uv run python scripts/generate_openapi.py --check (no schema change)

🤖 Generated with Claude Code

…rategy The chat, messages, and responses handlers each consumed config.is_platform_mode and re-branched on it inline at many sites, interleaving the two modes' credential-resolution and usage-reporting logic in one long function body. A bug had to be fixed in three places and it was easy to add a path that silently worked in only one mode. Introduce a RequestModeStrategy seam (src/gateway/api/routes/_mode_strategy.py) selected once per request: - resolve() runs the platform resolve call (platform) or auth plus budget reservation (standalone) and stashes the per-request state the handler reads. - A RequestSettlement object supplies the usage-settlement hooks the streaming generators already accept, replacing the inline "if platform_mode" branches in the streaming callbacks and the standalone non-streaming tail. Dispatch mechanics (the run_platform_attempts runner vs the standalone single call, the streaming fallback-vs-single split) stay in the handlers; only credential resolution and usage settlement move behind the seam. Standalone keeps its own dispatch because its provider kwargs are richer than ResolvedAttempt can model, so routing it through the attempt-based runner would silently drop credentials. No behavior change on the wire: status codes, headers, and budget reconcile/refund semantics are preserved, including the messages/responses pre-commit log-without-refund path. Adds tests/unit/test_mode_strategy.py covering each strategy and settlement in isolation; the existing standalone and platform integration suites pass unchanged. Fixes #102 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-09T16:41:35Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d7a1b221-14fb-4734-92d3-44385c317840

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/request-mode-strategy

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/request-mode-strategy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…errors Standalone /v1/messages and /v1/responses streaming requests whose provider call failed before the first chunk logged the error but did not refund the budget reservation, while /v1/chat/completions and all non-streaming paths did. The leaked hold was never released (a budget-period reset zeroes spend but not reserved), so repeated pre-commit streaming failures accumulated reserved until the user hit a spurious "budget exceeded" 403 with low actual spend. The asymmetry predates the strategy refactor; this makes the behavior uniform. Route the messages/responses pre-commit except through settlement.on_error (log plus refund), matching chat and the non-streaming handlers, and remove the now-dead on_provider_error_precommit hook. In platform mode this also reports an error outcome upstream on a pre-commit failure, consistent with how mid-stream errors already report. No wire change: the HTTP status and body are unchanged; only users.reserved is released instead of leaked. Add tests/integration/test_mode_strategy_e2e.py exercising the strategy seam end-to-end in both modes: standalone reconcile/refund settlement, the now-uniform pre-commit refund across all three endpoints, mcp_server_ids resolution, and the platform request-id/correlation headers plus upstream usage reporting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

njbrake temporarily deployed to integration-tests June 9, 2026 16:25 — with GitHub Actions Inactive

njbrake temporarily deployed to integration-tests June 9, 2026 16:48 — with GitHub Actions Inactive

njbrake mentioned this pull request Jun 9, 2026

fix(budget): refund the reservation on streaming pre-commit provider errors #129

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(routes): encapsulate standalone-vs-platform mode behind a strategy#127

refactor(routes): encapsulate standalone-vs-platform mode behind a strategy#127
njbrake wants to merge 2 commits into
mainfrom
feat/request-mode-strategy

njbrake commented Jun 9, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

njbrake commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope notes

Bundled bug fix: streaming pre-commit reservation leak

Test plan

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

njbrake commented Jun 9, 2026 •

edited

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading