refactor(routes): encapsulate standalone-vs-platform mode behind a strategy#127
Draft
njbrake wants to merge 2 commits into
Draft
refactor(routes): encapsulate standalone-vs-platform mode behind a strategy#127njbrake wants to merge 2 commits into
njbrake wants to merge 2 commits into
Conversation
…rategy The chat, messages, and responses handlers each consumed config.is_platform_mode and re-branched on it inline at many sites, interleaving the two modes' credential-resolution and usage-reporting logic in one long function body. A bug had to be fixed in three places and it was easy to add a path that silently worked in only one mode. Introduce a RequestModeStrategy seam (src/gateway/api/routes/_mode_strategy.py) selected once per request: - resolve() runs the platform resolve call (platform) or auth plus budget reservation (standalone) and stashes the per-request state the handler reads. - A RequestSettlement object supplies the usage-settlement hooks the streaming generators already accept, replacing the inline "if platform_mode" branches in the streaming callbacks and the standalone non-streaming tail. Dispatch mechanics (the run_platform_attempts runner vs the standalone single call, the streaming fallback-vs-single split) stay in the handlers; only credential resolution and usage settlement move behind the seam. Standalone keeps its own dispatch because its provider kwargs are richer than ResolvedAttempt can model, so routing it through the attempt-based runner would silently drop credentials. No behavior change on the wire: status codes, headers, and budget reconcile/refund semantics are preserved, including the messages/responses pre-commit log-without-refund path. Adds tests/unit/test_mode_strategy.py covering each strategy and settlement in isolation; the existing standalone and platform integration suites pass unchanged. Fixes #102 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…errors Standalone /v1/messages and /v1/responses streaming requests whose provider call failed before the first chunk logged the error but did not refund the budget reservation, while /v1/chat/completions and all non-streaming paths did. The leaked hold was never released (a budget-period reset zeroes spend but not reserved), so repeated pre-commit streaming failures accumulated reserved until the user hit a spurious "budget exceeded" 403 with low actual spend. The asymmetry predates the strategy refactor; this makes the behavior uniform. Route the messages/responses pre-commit except through settlement.on_error (log plus refund), matching chat and the non-streaming handlers, and remove the now-dead on_provider_error_precommit hook. In platform mode this also reports an error outcome upstream on a pre-commit failure, consistent with how mid-stream errors already report. No wire change: the HTTP status and body are unchanged; only users.reserved is released instead of leaked. Add tests/integration/test_mode_strategy_e2e.py exercising the strategy seam end-to-end in both modes: standalone reconcile/refund settlement, the now-uniform pre-commit refund across all three endpoints, mcp_server_ids resolution, and the platform request-id/correlation headers plus upstream usage reporting. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
17 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Summary
Fixes #102. The chat, messages, and responses handlers each read
config.is_platform_modeand re-branched on it inline at many sites (chat 15, responses 13, messages 11), interleaving the two modes' credential-resolution and usage-reporting logic in one long function body. A bug had to be fixed in three places and it was easy to add a path that silently worked in only one mode.src/gateway/api/routes/_mode_strategy.pywith two seams, selected once per request:RequestModeStrategy(StandaloneStrategy/PlatformStrategy):resolve()runs the platform resolve call or the auth plus budget-reservation flow and stashes the per-request state;resolve_mcp_servers()replaces themcp_server_idsmode branch;make_settlement()builds the right settlement.RequestSettlement(StandaloneSettlement/PlatformSettlement): supplies theon_success/on_error/on_no_usage/on_incompletehooks the streaming generators already accept, replacing the inlineif platform_modebranches in the streaming callbacks and the standalone non-streaming tail.strategy.resolve(spec=...).run_platform_attemptsrunner vs the standalone single call, the streaming fallback-vs-single split) stay in the handlers, as the issue permits.Scope notes
run_platform_attempts. Standalone keeps its own dispatch becauseget_provider_kwargsreturns richer credentials (client_args,api_version, Vertex env) thanResolvedAttemptmodels, so routing it through the attempt-based runner would silently drop them.Bundled bug fix: streaming pre-commit reservation leak
Investigating the seam surfaced a pre-existing standalone bug (introduced when the reservation system landed, not by this refactor). On
/v1/messagesand/v1/responses, a streaming request whose provider call failed before the first chunk logged the error but did not refund the budget reservation, while/v1/chat/completionsand all non-streaming paths did. The leaked hold was never released (a budget-period reset zeroesspendbut notreserved), so repeated pre-commit streaming failures accumulatedreserveduntil the user hit a spurious "budget exceeded" 403 with low actual spend.The fix routes the messages/responses pre-commit path through
settlement.on_error(log plus refund), matching chat and the non-streaming handlers, and removes the now-deadon_provider_error_precommithook. In platform mode this also reports an error outcome upstream on a pre-commit failure, consistent with how mid-stream errors already report. The HTTP status and body are unchanged; onlyusers.reservedis released instead of leaked.Test plan
make lint(ruff) andmake typecheck(mypy strict, 154 files)tests/unit/test_mode_strategy.py: strategy selection, both settlements, no-DB reject pathstests/integration/test_mode_strategy_e2e.py(11 tests): standalone reconcile/refund settlement, the now-uniform pre-commit refund across all three endpoints,mcp_server_idsresolution, and platform request-id/correlation headers plus upstream usage reportinguv run python scripts/generate_openapi.py --check(no schema change)🤖 Generated with Claude Code