Pools
| Pool | Methods | Limit |
|---|---|---|
| Read | GET, HEAD | 600 requests / minute |
| Write | POST, PUT, PATCH, DELETE | 60 requests / minute |
GET and counts against the read pool. The
POST that created the job counts against the write pool.
Response headers
Every authenticated response — both2xx and 429 — includes:
| Header | Meaning |
|---|---|
X-RateLimit-Pool | Pool that served this request: read or write. |
X-RateLimit-Limit | Maximum requests in the current window for this pool. |
X-RateLimit-Remaining | Requests remaining before the next refill. |
X-RateLimit-Reset | Unix timestamp (seconds) when the bucket refills. |
429 responses additionally include Retry-After — the number of seconds
until the next request will succeed.
Handling 429
A429 response looks like:
- Sleep for
Retry-Afterseconds. This is the shortest interval that guarantees the next request lands after a refill. - Retry the same request. The original was rejected before doing any
work — it is safe to repeat, including
POSTandDELETE. - Back off if you hit
429repeatedly. Multiple consecutive limit hits usually mean you are exceeding the pool’s steady-state capacity, not just bursting. Add a backoff multiplier on top ofRetry-After.
Staying under the limit
- Watch
X-RateLimit-Remaining. When it dips below 10% of the limit, slow down before you hit zero. - Cache reads.
GET /me, agent state, and conversation lists rarely change second-to-second. Caching for even a few seconds cuts read traffic significantly. - Poll async jobs on a backoff. Start at 1 second, double up to 5 seconds. A tight poll loop burns read points for no benefit — the agent is not faster because you asked sooner.
- Use streaming for long turns.
Accept: text/event-streamon the message POST returns incremental tokens over one connection instead of N poll requests. See the changelog entry on async message jobs for the trade-offs. - Parallelize cautiously. Ten concurrent writers share the same 60-per- minute write pool. Coordinate at the worker level, not the request level.
Rate limiter unavailable
In the rare event the rate limiter store itself is down, the API returns503 with system.rate_limit_unavailable. Treat this as a transient error
and retry with backoff — the limiter is fail-closed by design.
Unauthenticated routes
GET /health, GET /openapi.json, and GET /openapi.yaml do not consume
token points — they use a separate per-IP limiter intended for monitoring.
Do not rely on these endpoints for application traffic.