fix(budget): refund the reservation on streaming pre-commit provider errors#129
fix(budget): refund the reservation on streaming pre-commit provider errors#129njbrake wants to merge 2 commits into
Conversation
WalkthroughThis PR fixes a budget reservation leak in streaming endpoints when upstream provider errors occur before streaming begins. It adds missing refund calls to ChangesStreaming pre-commit reservation refund fixes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…errors
A streaming /v1/messages or /v1/responses request whose provider call fails
before the first chunk ("pre-commit") logged the error but did not refund the
budget pre-debit hold, while /v1/chat/completions and all non-streaming paths
did. Because a budget-period reset zeroes spend but not reserved, the leaked
hold was never released: repeated pre-commit streaming failures accumulated
reserved until the user hit a spurious "budget exceeded" 403 with low actual
spend.
Add the missing refund_reservation to the messages/responses streaming
pre-commit except blocks. No wire change: the HTTP status and body are
unchanged; only users.reserved is released instead of leaked. Adds a regression
test asserting reserved returns to 0 across all three endpoints.
Fixes #128
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2d5c879 to
28836c0
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes a standalone-mode budget accounting bug where streaming requests to /v1/messages and /v1/responses that fail before the first streamed chunk (“pre-commit”) would log the error but not refund the budget reservation hold, causing users.reserved to leak upward over time and eventually trigger incorrect “budget exceeded” rejections.
Changes:
- Refund the budget reservation in the streaming pre-commit
exceptblocks for/v1/messagesand/v1/responses, aligning behavior with chat streaming and non-streaming handlers. - Update the in-code comments to accurately describe the intended “log + refund” behavior.
- Add integration regression tests asserting
users.reservedreturns to0after pre-commit streaming failures across/v1/chat/completions,/v1/messages, and/v1/responses.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
src/gateway/api/routes/messages.py |
Adds missing refund_reservation call on streaming pre-commit provider errors (standalone DB path). |
src/gateway/api/routes/responses.py |
Adds missing refund_reservation call on streaming pre-commit provider errors (standalone DB path). |
tests/integration/test_streaming_precommit_refund.py |
New integration coverage to prevent regressions of leaked users.reserved holds on pre-commit streaming failures. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/integration/test_streaming_precommit_refund.py (1)
48-77: 💤 Low valueConsider reusing the engine across helper calls for efficiency.
Each call to
_user_stateand_poll_usage_logscreates and disposes a new SQLAlchemy engine. For integration tests this works fine, but if you wanted to tighten it up, you could pass the engine as a parameter or use a module-scoped fixture. That said, this is test code and correctness > micro-optimization here, so totally fine to leave as-is.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/integration/test_streaming_precommit_refund.py` around lines 48 - 77, Both _user_state and _poll_usage_logs create and dispose a new SQLAlchemy engine per call which is inefficient; refactor to accept a shared engine or session factory (e.g., pass an engine or a sessionmaker into _user_state and _poll_usage_logs) or use a module-scoped fixture so create_engine is only called once, update calls to these helpers to supply the shared engine/sessionmaker, and remove per-call engine.dispose() so the shared engine is not prematurely torn down.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/integration/test_streaming_precommit_refund.py`:
- Around line 48-77: Both _user_state and _poll_usage_logs create and dispose a
new SQLAlchemy engine per call which is inefficient; refactor to accept a shared
engine or session factory (e.g., pass an engine or a sessionmaker into
_user_state and _poll_usage_logs) or use a module-scoped fixture so
create_engine is only called once, update calls to these helpers to supply the
shared engine/sessionmaker, and remove per-call engine.dispose() so the shared
engine is not prematurely torn down.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8a00decf-4546-4916-85cc-582af30f2996
📒 Files selected for processing (3)
src/gateway/api/routes/messages.pysrc/gateway/api/routes/responses.pytests/integration/test_streaming_precommit_refund.py
# Conflicts: # src/gateway/api/routes/messages.py # src/gateway/api/routes/responses.py


Description
Fixes #128. In standalone mode, a streaming request to
/v1/messagesor/v1/responseswhose upstream provider call fails before the first chunk ("pre-commit") logged the error but did not refund the budget pre-debit hold./v1/chat/completionsand all non-streaming paths already refund here. Because a budget-period reset zeroesspendbut notreserved, the leaked hold was never released, so repeated failures accumulatedreserveduntil the user hit a spurious "budget exceeded" 403 with low actual spend.This adds the missing
refund_reservationto the messages/responses streaming pre-commitexceptblocks, making all three endpoints consistent.users.reservedis released instead of leaked.Relationship to #127
This is the standalone bug fix, split out of the strategy-seam refactor (#127) so it can land independently. #127 carries the same fix in its refactored form; once this merges, #127 will be rebased and reconciled.
PR Type
Relevant issues
Fixes #128
Checklist
tests/unit,tests/integration).make lint,make typecheck,make test).uv run python scripts/generate_openapi.py).Test plan
tests/integration/test_streaming_precommit_refund.py: assertsusers.reservedreturns to 0 after a pre-commit streaming failure on/v1/chat/completions,/v1/messages, and/v1/responses(with a budgeted user + priced model so the hold is genuinely non-zero).reserved > 0) and pass with it.make lint(ruff) andmake typecheck(mypy strict) pass.AI Usage
AI Model/Tool used: Claude Code
Any additional AI details you'd like to share:
Claude drafted the fix and tests via back-and-forth with @njbrake; the reasoning and decisions are his.
NOTE:
When responding to reviewer questions, please respond yourself rather than copy/pasting reviewer comments into an AI and pasting back its answer. We want to discuss with you, not your AI :)