The three limits
Request rate
How many API calls you can make per unit of time.
Sync concurrency budget
How many synchronous captures can run inflight at once.
Async admission queue
How much queued async work your key can hold, waiting to run.
- Request rate gates every request you send, regardless of mode. Send too fast and calls are rejected before any work starts.
- Sync concurrency budget applies to synchronous captures —
?mode=sync, thePrefer: waitheader, and thePOST /v1/search/:surfacealias. Each inflight sync capture consumes one unit of the budget until the Envelope returns. - Async admission queue applies to asynchronous jobs. When you
POST /v1/search(async default), the parent job and its children are admitted into a queue with a fixed depth. Children run as capacity frees up.
The 429 codes
When you hit a limit, the response is429 with a machine-readable code. Each fires for a different reason.
| Code | Fires when |
|---|---|
RATE_LIMIT_EXCEEDED | You exceeded the request rate for your key. Slow the pace of calls. |
CONCURRENCY_LIMIT_EXCEEDED | Too many synchronous captures are inflight at once — the sync concurrency budget is full. |
QUEUE_CAPACITY_EXCEEDED | The async admission queue is full; there is no room to admit more work right now. |
Rate-limit headers
All429 responses carry a Retry-After header (seconds to wait) plus the standard rate-limit trio:
Seconds to wait before retrying. Always honor this value.
The request-rate ceiling for your key in the current window.
Requests left in the current window.
When the window resets (epoch seconds).
429:
Your sync concurrency budget.
Sync captures currently inflight.
Async work currently waiting in the admission queue.
A live inflight picture
GET /v1/async/status returns a real-time snapshot for your key: admission queue depth and capacity, remaining sync concurrency budget, and inflight children grouped by surface and by region. Poll it when you want to pace submissions against actual headroom rather than reacting to 429s.
Recommended backoff strategy
Honor Retry-After
On any
429, wait at least the number of seconds in Retry-After before retrying. This is the single most important rule — it aligns your retries with when capacity actually frees up.Add exponential backoff with jitter
If a limit persists, grow the wait between attempts (for example 1s, 2s, 4s, 8s) and add a little randomness so batched clients don’t retry in lockstep. Cap the number of attempts.
Use idempotencyKey for safe retries
Set the
idempotencyKey body field on submissions. If a retry lands after the original was already accepted, the same job is returned instead of creating a duplicate — so retries can’t double-charge credits.Node retry-with-backoff
A minimal client that honorsRetry-After, falls back to exponential backoff, and reuses an idempotencyKey so retries are safe:
The same pattern works for the synchronous endpoints — just retry on
CONCURRENCY_LIMIT_EXCEEDED the same way you retry on rate limits.Related
Async status
Live queue depth, budget, and inflight children for your key.
Errors
Every error code and response shape, including the 429s.
Webhooks
Get completions pushed to you instead of polling.
Asynchronous jobs
The high-throughput submission path.