What Is Rate Limiting? API Throttling Explained
Rate limiting controls how many requests a client or user can make to an API or service within a time window. It protects services from abuse, brute-force attacks, and accidental denial-of-service caused by buggy clients. Most public APIs enforce rate limits; most authentication endpoints apply strict limits to slow credential-stuffing attacks.
Common Rate Limiting Algorithms
Token Bucket: a bucket fills at a steady rate (refill rate). Each request consumes one token. Requests burst up to bucket capacity then throttle at the refill rate. Fixed Window: count requests in a time window (e.g., per minute). Simple but allows double the rate at window boundaries. Sliding Window Log: track timestamps of recent requests. Accurate but memory-intensive. Sliding Window Counter: approximate using a weighted blend of the current and previous window — accurate and memory-efficient.
HTTP Headers and the 429 Response
Standard response headers: X-RateLimit-Limit (max requests per window), X-RateLimit-Remaining (requests left), X-RateLimit-Reset (Unix timestamp when the window resets), Retry-After (seconds to wait after a 429). Return HTTP 429 Too Many Requests when the limit is exceeded. The Retry-After header tells clients when they can safely retry.
What to Rate Limit
Authentication endpoints: 5–10 attempts per minute per IP to prevent brute force. Public APIs: by API key or authenticated user for fair use. Unauthenticated endpoints: by IP address with generous limits. Expensive operations (PDF generation, AI inference): per-user strict limits. Avoid rate-limiting health check endpoints — monitoring systems need reliable access.