API Rate Limit Simulator
Pick a rate-limiting algorithm, set its parameters and a traffic pattern, then watch exactly how many requests are allowed versus throttled over a second-by-second timeline — with stats, a chart and the headers each algorithm implies. Deterministic and parsed entirely in your browser.
Effective long-run ceiling: 2 tokens/s, burst up to 20.
A bucket holds up to capacity tokens and refills at refill-rate tokens per second. Each request spends one token; if the bucket is empty the request is throttled. This permits short bursts up to the bucket size while bounding the long-run rate to the refill rate — the most common production choice.
Per-second simulation log
| Time (s) | Incoming | Allowed | Throttled (429) | Tokens left |
|---|---|---|---|---|
| 0 | 5 | 5 | 0 | 15 |
| 1 | 5 | 5 | 0 | 12 |
| 2 | 5 | 5 | 0 | 9 |
| 3 | 5 | 5 | 0 | 6 |
| 4 | 5 | 5 | 0 | 3 |
| 5 | 5 | 5 | 0 | 0 |
| 6 | 5 | 2 | 3 | 0 |
| 7 | 5 | 2 | 3 | 0 |
| 8 | 5 | 2 | 3 | 0 |
| 9 | 5 | 2 | 3 | 0 |
| 10 | 5 | 2 | 3 | 0 |
| 11 | 5 | 2 | 3 | 0 |
| 12 | 5 | 2 | 3 | 0 |
| 13 | 5 | 2 | 3 | 0 |
| 14 | 5 | 2 | 3 | 0 |
| 15 | 5 | 2 | 3 | 0 |
| 16 | 5 | 2 | 3 | 0 |
| 17 | 5 | 2 | 3 | 0 |
| 18 | 5 | 2 | 3 | 0 |
| 19 | 5 | 2 | 3 | 0 |
| 20 | 5 | 2 | 3 | 0 |
| 21 | 5 | 2 | 3 | 0 |
| 22 | 5 | 2 | 3 | 0 |
| 23 | 5 | 2 | 3 | 0 |
| 24 | 5 | 2 | 3 | 0 |
| 25 | 5 | 2 | 3 | 0 |
| 26 | 5 | 2 | 3 | 0 |
| 27 | 5 | 2 | 3 | 0 |
| 28 | 5 | 2 | 3 | 0 |
| 29 | 5 | 2 | 3 | 0 |
| 30 | 45 | 2 | 43 | 0 |
| 31 | 5 | 2 | 3 | 0 |
| 32 | 5 | 2 | 3 | 0 |
| 33 | 5 | 2 | 3 | 0 |
| 34 | 5 | 2 | 3 | 0 |
| 35 | 5 | 2 | 3 | 0 |
| 36 | 5 | 2 | 3 | 0 |
| 37 | 5 | 2 | 3 | 0 |
| 38 | 5 | 2 | 3 | 0 |
| 39 | 5 | 2 | 3 | 0 |
| 40 | 5 | 2 | 3 | 0 |
| 41 | 5 | 2 | 3 | 0 |
| 42 | 5 | 2 | 3 | 0 |
| 43 | 5 | 2 | 3 | 0 |
| 44 | 5 | 2 | 3 | 0 |
| 45 | 5 | 2 | 3 | 0 |
| 46 | 5 | 2 | 3 | 0 |
| 47 | 5 | 2 | 3 | 0 |
| 48 | 5 | 2 | 3 | 0 |
| 49 | 5 | 2 | 3 | 0 |
| 50 | 5 | 2 | 3 | 0 |
| 51 | 5 | 2 | 3 | 0 |
| 52 | 5 | 2 | 3 | 0 |
| 53 | 5 | 2 | 3 | 0 |
| 54 | 5 | 2 | 3 | 0 |
| 55 | 5 | 2 | 3 | 0 |
| 56 | 5 | 2 | 3 | 0 |
| 57 | 5 | 2 | 3 | 0 |
| 58 | 5 | 2 | 3 | 0 |
| 59 | 5 | 2 | 3 | 0 |
Picking and tuning a rate limit
The four algorithms here trade smoothness against burst tolerance and cost. The token bucket is the workhorse: it pins the long-run rate to the refill rate while letting a client spend a saved-up burst equal to the bucket capacity, which mirrors how real clients send traffic in clumps. The leaky bucket is its mirror image — it absorbs the same bursts into a queue but emits a perfectly even stream, ideal when a downstream system needs steady throughput and can tolerate the extra queuing latency.
The two window algorithms are about accuracy. A fixed window is the cheapest thing that works — one integer counter per window — but it has a notorious flaw: a client can fire the full limit in the last second of one window and the full limit again in the first second of the next, briefly doubling the effective rate. The sliding window log closes that gap by tracking the timestamps of recent requests and evicting them as they age out, giving an exact rolling count at the cost of storing per-request data. Run a spike near a window boundary in the simulator above to see the fixed-window burst appear and the sliding window suppress it.
Whatever you choose, communicate the limit to clients. Return X-RateLimit-Limit, X-RateLimit-Remaining and X-RateLimit-Reset on every response, and on a 429 always include Retry-After so well-behaved clients back off instead of hammering you. Size the long-run rate from your sustainable backend capacity and the burst from the largest legitimate spike you want to absorb, then validate against your real traffic pattern rather than a guess. Build the requests you'll be limiting in the API Request Builder.
In a distributed fleet the limiter usually lives in a shared store — a Redis token bucket or a sorted-set sliding window — so all nodes enforce one global limit; without shared state each node only sees its slice and the effective limit multiplies by the node count. The same headers and retry semantics apply to event delivery, so document them alongside your webhook payloads, and fold the limit details into your reference with the OpenAPI Documentation Generator.
Trusted by Platform & Backend Engineers
“I used this to settle an argument about fixed window vs token bucket before a launch. Dropping our real traffic pattern in and watching the boundary burst on fixed window — then seeing token bucket smooth it out — was more convincing than any whiteboard. We shipped a token bucket with a capacity sized exactly from the spike test here.”
“The deterministic simulation is the selling point. Same inputs, same output, every time — so I can paste parameters into a runbook and a teammate reproduces the exact chart. The Retry-After explanation per algorithm matched what we needed to put in our gateway config almost verbatim.”
“Sliding window log behaviour is modelled correctly, including eviction of aged requests, which a lot of explainers get wrong. I'd love an option to model multiple nodes without shared state, but dividing the limit and simulating one node as the FAQ suggests works fine for now.”
“I teach rate limiting in workshops and this is now the centrepiece. The timeline chart with green allowed and red throttled bars makes the leaky bucket's smooth drain obvious at a glance, and everything runs in the browser so attendees can fork the parameters live. No setup, no backend.”
Love using our calculator?
Related API tools
Related Articles
Dive deeper with our expert guides and tutorials related to API Rate Limit Simulator
token bucket · fixed · sliding window · leaky bucket · deterministic · in-browser · Last reviewed: 2026-06