The gateway uses a per-user token bucket rate limiter. Each authenticated user (JWT sub claim) gets its own bucket.
Enable
Section titled “Enable”qql-go serve--rate-limit 100 \ # max requests per second per user --rate-limit-capacity 20 # max burst size per user| Flag | Default | Description |
|---|---|---|
--rate-limit | 0 | Requests per second per user. 0 = unlimited. |
--rate-limit-capacity | 20 | Max burst size — tokens that can accumulate |
Behavior
Section titled “Behavior”When a user's bucket is empty:
- Request gets
429 Resource Exhausted - Response includes
Retry-Afterheader (seconds until next token available)
When a user's bucket has tokens:
- Request proceeds normally
- One token is consumed
Tokens refill at the configured rate (e.g., --rate-limit 100 = 100 tokens/second).
--rate-limit-capacity controls burst tolerance. A user with --rate-limit 10 --rate-limit-capacity 50 can send up to 50 requests instantly before rate limiting kicks in, then sustains 10/second.
Cleanup
Section titled “Cleanup”Stale buckets (no activity for 5 minutes) are automatically cleaned up to prevent memory leaks in long-running gateways.
Unauthenticated Requests
Section titled “Unauthenticated Requests”If --jwks-url is not set, the rate limiter uses the client IP address as the bucket key.