Every API key has three independent limits. Understanding how they interact makes it easy to run at high throughput without tripping errors.

The three limits

Request rate

How many API calls you can make per unit of time.

Sync concurrency budget

How many synchronous captures can run inflight at once.

Async admission queue

How much queued async work your key can hold, waiting to run.
They apply to different parts of the flow:
  • Request rate gates every request you send, regardless of mode. Send too fast and calls are rejected before any work starts.
  • Sync concurrency budget applies to synchronous captures — ?mode=sync, the Prefer: wait header, and the POST /v1/search/:surface alias. Each inflight sync capture consumes one unit of the budget until the Envelope returns.
  • Async admission queue applies to asynchronous jobs. When you POST /v1/search (async default), the parent job and its children are admitted into a queue with a fixed depth. Children run as capacity frees up.
Async is the throughput path. Because work sits in the admission queue and drains as capacity opens, you can submit large batches without holding open connections. Prefer async + webhooks for volume.

The 429 codes

When you hit a limit, the response is 429 with a machine-readable code. Each fires for a different reason.
CodeFires when
RATE_LIMIT_EXCEEDEDYou exceeded the request rate for your key. Slow the pace of calls.
CONCURRENCY_LIMIT_EXCEEDEDToo many synchronous captures are inflight at once — the sync concurrency budget is full.
QUEUE_CAPACITY_EXCEEDEDThe async admission queue is full; there is no room to admit more work right now.
See Errors for the full error shape and other codes.

Rate-limit headers

All 429 responses carry a Retry-After header (seconds to wait) plus the standard rate-limit trio:
Retry-After
integer
Seconds to wait before retrying. Always honor this value.
X-RateLimit-Limit
integer
The request-rate ceiling for your key in the current window.
X-RateLimit-Remaining
integer
Requests left in the current window.
X-RateLimit-Reset
integer
When the window resets (epoch seconds).
Search responses (both sync and async) also carry concurrency headers so you can see budget pressure before you get a 429:
X-Concurrency-Limit
integer
Your sync concurrency budget.
X-Concurrency-Running
integer
Sync captures currently inflight.
X-Concurrency-Queued
integer
Async work currently waiting in the admission queue.

A live inflight picture

GET /v1/async/status returns a real-time snapshot for your key: admission queue depth and capacity, remaining sync concurrency budget, and inflight children grouped by surface and by region. Poll it when you want to pace submissions against actual headroom rather than reacting to 429s.
curl https://api.aisearchapi.dev/v1/async/status \
  -H "Authorization: Bearer $AISEARCH_API_KEY"
See the async status reference for the full response.
1

Honor Retry-After

On any 429, wait at least the number of seconds in Retry-After before retrying. This is the single most important rule — it aligns your retries with when capacity actually frees up.
2

Add exponential backoff with jitter

If a limit persists, grow the wait between attempts (for example 1s, 2s, 4s, 8s) and add a little randomness so batched clients don’t retry in lockstep. Cap the number of attempts.
3

Use idempotencyKey for safe retries

Set the idempotencyKey body field on submissions. If a retry lands after the original was already accepted, the same job is returned instead of creating a duplicate — so retries can’t double-charge credits.
4

Prefer webhooks over tight poll loops

Instead of polling GET /v1/jobs/:id in a tight loop (which burns your request rate), submit async with a webhook and let completions come to you. Reserve polling for occasional reconciliation.
Tight poll loops are the most common cause of self-inflicted RATE_LIMIT_EXCEEDED. If you must poll, space it out and back off — or switch to webhooks.

Node retry-with-backoff

A minimal client that honors Retry-After, falls back to exponential backoff, and reuses an idempotencyKey so retries are safe:
const API_KEY = process.env.AISEARCH_API_KEY;

async function submitWithRetry(body, { maxAttempts = 5 } = {}) {
  // Reuse one key across all retries of this submission.
  const idempotencyKey = crypto.randomUUID();

  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const res = await fetch("https://api.aisearchapi.dev/v1/search", {
      method: "POST",
      headers: {
        Authorization: `Bearer ${API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({ ...body, idempotencyKey }),
    });

    if (res.status !== 429) return res; // success or a non-retryable error

    // Honor Retry-After when present; otherwise exponential backoff with jitter.
    const retryAfter = Number(res.headers.get("Retry-After"));
    const backoff = Number.isFinite(retryAfter) && retryAfter > 0
      ? retryAfter * 1000
      : (2 ** attempt) * 1000 + Math.random() * 250;

    await new Promise((r) => setTimeout(r, backoff));
  }

  throw new Error("Exhausted retries after repeated 429 responses");
}

const res = await submitWithRetry({
  query: "best noise-cancelling headphones under $300",
  surfaces: ["chatgpt", "perplexity"],
  regions: [{ country: "US" }],
});
const job = await res.json();
console.log(job.jobId, job.status);
The same pattern works for the synchronous endpoints — just retry on CONCURRENCY_LIMIT_EXCEEDED the same way you retry on rate limits.

Async status

Live queue depth, budget, and inflight children for your key.

Errors

Every error code and response shape, including the 429s.

Webhooks

Get completions pushed to you instead of polling.

Asynchronous jobs

The high-throughput submission path.