Skip to content
Token bucket · fixed · sliding window · leaky bucket

API Rate Limit Simulator

Pick a rate-limiting algorithm, set its parameters and a traffic pattern, then watch exactly how many requests are allowed versus throttled over a second-by-second timeline — with stats, a chart and the headers each algorithm implies. Deterministic and parsed entirely in your browser.

01 · Configure

Effective long-run ceiling: 2 tokens/s, burst up to 20.

340
Total requests
138
Allowed
HTTP 200
202
Throttled
HTTP 429
59.4%
Throttle rate
15
Peak tokens
Timeline — allowed vs throttled per second
Allowed (200) Throttled (429)Bars = each simulated second · hover for detail
How Token Bucket behaves

A bucket holds up to capacity tokens and refills at refill-rate tokens per second. Each request spends one token; if the bucket is empty the request is throttled. This permits short bursts up to the bucket size while bounding the long-run rate to the refill rate — the most common production choice.

Chart level: Level = tokens remaining after the tick.
Headers: Emit X-RateLimit-Limit (capacity), X-RateLimit-Remaining (tokens left) and Retry-After ≈ (1 − tokens) / refillRate seconds when empty.
Deep analysis

Per-second simulation log

Time (s)IncomingAllowedThrottled (429)Tokens left
055015
155012
25509
35506
45503
55500
65230
75230
85230
95230
105230
115230
125230
135230
145230
155230
165230
175230
185230
195230
205230
215230
225230
235230
245230
255230
265230
275230
285230
295230
30452430
315230
325230
335230
345230
355230
365230
375230
385230
395230
405230
415230
425230
435230
445230
455230
465230
475230
485230
495230
505230
515230
525230
535230
545230
555230
565230
575230
585230
595230
Field notes

Picking and tuning a rate limit

The four algorithms here trade smoothness against burst tolerance and cost. The token bucket is the workhorse: it pins the long-run rate to the refill rate while letting a client spend a saved-up burst equal to the bucket capacity, which mirrors how real clients send traffic in clumps. The leaky bucket is its mirror image — it absorbs the same bursts into a queue but emits a perfectly even stream, ideal when a downstream system needs steady throughput and can tolerate the extra queuing latency.

The two window algorithms are about accuracy. A fixed window is the cheapest thing that works — one integer counter per window — but it has a notorious flaw: a client can fire the full limit in the last second of one window and the full limit again in the first second of the next, briefly doubling the effective rate. The sliding window log closes that gap by tracking the timestamps of recent requests and evicting them as they age out, giving an exact rolling count at the cost of storing per-request data. Run a spike near a window boundary in the simulator above to see the fixed-window burst appear and the sliding window suppress it.

Whatever you choose, communicate the limit to clients. Return X-RateLimit-Limit, X-RateLimit-Remaining and X-RateLimit-Reset on every response, and on a 429 always include Retry-After so well-behaved clients back off instead of hammering you. Size the long-run rate from your sustainable backend capacity and the burst from the largest legitimate spike you want to absorb, then validate against your real traffic pattern rather than a guess. Build the requests you'll be limiting in the API Request Builder.

In a distributed fleet the limiter usually lives in a shared store — a Redis token bucket or a sorted-set sliding window — so all nodes enforce one global limit; without shared state each node only sees its slice and the effective limit multiplies by the node count. The same headers and retry semantics apply to event delivery, so document them alongside your webhook payloads, and fold the limit details into your reference with the OpenAPI Documentation Generator.

API Rate Limit FAQs

Have more questions? Contact us

Trusted by Platform & Backend Engineers

4.8
Based on 1,270 reviews

I used this to settle an argument about fixed window vs token bucket before a launch. Dropping our real traffic pattern in and watching the boundary burst on fixed window — then seeing token bucket smooth it out — was more convincing than any whiteboard. We shipped a token bucket with a capacity sized exactly from the spike test here.

P
Priya Nair
Backend platform engineer
June 12, 2026

The deterministic simulation is the selling point. Same inputs, same output, every time — so I can paste parameters into a runbook and a teammate reproduces the exact chart. The Retry-After explanation per algorithm matched what we needed to put in our gateway config almost verbatim.

M
Marcus Feld
Staff SRE
May 29, 2026

Sliding window log behaviour is modelled correctly, including eviction of aged requests, which a lot of explainers get wrong. I'd love an option to model multiple nodes without shared state, but dividing the limit and simulating one node as the FAQ suggests works fine for now.

Y
Yuki Tanaka
API gateway developer
April 21, 2026

I teach rate limiting in workshops and this is now the centrepiece. The timeline chart with green allowed and red throttled bars makes the leaky bucket's smooth drain obvious at a glance, and everything runs in the browser so attendees can fork the parameters live. No setup, no backend.

S
Sofia Marchetti
Developer advocate
March 8, 2026

Love using our calculator?

Connected instruments

Related API tools

Learn More

Related Articles

Dive deeper with our expert guides and tutorials related to API Rate Limit Simulator

Loading articles...

token bucket · fixed · sliding window · leaky bucket · deterministic · in-browser · Last reviewed: 2026-06