I was building an API for a client last week when I noticed something strange in the logs. A single user was making thousands of requests per minute to an endpoint that should have been called maybe once an hour. It wasn’t malicious, just a bug in their code, but it was enough to slow down the service for everyone else. That moment made me realize how fragile our systems can be without proper controls. It’s not just about stopping bad actors; it’s about creating a fair, stable environment for all users. If you’re building APIs that others depend on, this is a skill you need. Let’s talk about how to do it right.
Think of rate limiting as a traffic light for your API. It tells requests when to go, when to slow down, and when to stop. Without it, you risk everything from server crashes to huge cloud bills. But how do you choose the right method? The answer depends on what you’re protecting.
The simplest approach is the fixed window. Imagine a counter that resets every hour. If you allow 100 requests per hour, the 101st request in that hour gets blocked. It’s easy to understand and implement. But it has a flaw. What if 100 requests come in at 1:59 PM, and another 100 come in at 2:00 PM? You’ve just had 200 requests in one minute, which might still overload your system. This is called the “boundary problem.”
So, we need something smoother. This is where the sliding window comes in. Instead of a fixed block of time, we look at a moving window. If your limit is 100 per hour, we only count requests from the past 60 minutes at any given moment. This gives you a much more accurate picture of real-time traffic. It’s more complex but far more effective for production systems.
Then there’s the token bucket. Picture a bucket that holds tokens. The bucket refills at a steady rate. Each API request costs one token. If the bucket is empty, the request is denied. The clever part is that the bucket can have a capacity larger than the refill rate. This allows for short bursts of traffic, which is perfect for user-facing applications where activity isn’t always perfectly smooth.
For this to work in a real Node.js application with multiple servers, we need a shared state. This is where Redis shines. It’s fast, it’s in-memory, and every server in your cluster can talk to the same Redis instance. This gives us a single source of truth for counting requests. Let’s set it up.
First, we connect to Redis. Using a client like ioredis with connection pooling is a good practice. It manages connections efficiently so we don’t waste resources.
// redisClient.js
const Redis = require('ioredis');
class RedisClient {
constructor() {
this.client = new Redis({
host: process.env.REDIS_HOST,
port: process.env.REDIS_PORT,
retryStrategy(times) {
const delay = Math.min(times * 50, 2000);
return delay;
}
});
this.client.on('error', (err) => {
console.error('Redis error:', err);
});
}
getClient() {
return this.client;
}
}
module.exports = new RedisClient().getClient();
Now, let’s build a sliding window rate limiter as Express middleware. The key is to use a Redis sorted set. We’ll use timestamps as scores. This lets us easily remove old entries and count how many are left in our time window.
// slidingWindowLimiter.js
const redisClient = require('./redisClient');
async function slidingWindowLimiter(req, res, next) {
const userId = req.user?.id || req.ip; // Use IP if no user is logged in
const key = `rate_limit:${userId}:${req.path}`;
const now = Date.now();
const windowMs = 60 * 60 * 1000; // 1 hour window
const maxRequests = 100;
// Lua script for atomic operations in Redis
const luaScript = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local max = tonumber(ARGV[3])
-- Remove timestamps older than the window
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count current requests
local current = redis.call('ZCARD', key)
if current < max then
-- Allow the request and add its timestamp
redis.call('ZADD', key, now, now)
redis.call('PEXPIRE', key, window)
return {1, max - current - 1} -- allowed, remaining
else
-- Deny the request
local oldest = redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')
local resetTime = tonumber(oldest[2]) + window
return {0, 0, resetTime} -- denied, remaining, reset timestamp
end
`;
try {
const result = await redisClient.eval(
luaScript,
1, // number of keys
key,
now.toString(),
windowMs.toString(),
maxRequests.toString()
);
const [allowed, remaining, resetTime] = result;
// Set helpful headers for the API consumer
res.setHeader('X-RateLimit-Limit', maxRequests);
res.setHeader('X-RateLimit-Remaining', remaining);
if (resetTime) {
res.setHeader('X-RateLimit-Reset', new Date(resetTime).toISOString());
}
if (allowed === 0) {
return res.status(429).json({
error: 'Too Many Requests',
message: `Rate limit exceeded. Try again after ${new Date(resetTime).toISOString()}`,
retryAfter: Math.ceil((resetTime - now) / 1000)
});
}
next(); // Request is allowed, proceed
} catch (error) {
console.error('Rate limiter error:', error);
// If Redis fails, let the request through. This is a "fail open" strategy.
// You might choose "fail closed" for stricter security.
next();
}
}
module.exports = slidingWindowLimiter;
But what about different user tiers? A free user might get 100 requests per hour, while a premium user gets 10,000. We need a flexible system. We can store user tiers in a database and fetch the limit configuration dynamically.
// tieredLimiter.js
async function tieredLimiter(req, res, next) {
const userId = req.user.id;
// Fetch the user's plan from your database
const userPlan = await UserPlan.findOne({ userId });
const limitConfig = getLimitConfig(userPlan.tier); // e.g., 'free', 'pro', 'enterprise'
// Now use limitConfig.windowMs and limitConfig.maxRequests
// in the sliding window logic from above...
// ... (sliding window logic here)
}
Handling bursts gracefully is another challenge. The token bucket algorithm is excellent for this. We can implement it by storing a token count and a last-updated timestamp in Redis.
// tokenBucketLimiter.js
async function tokenBucketCheck(userId, bucketCapacity, refillRate) {
const key = `token_bucket:${userId}`;
const now = Date.now();
const luaScript = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local refillPerMs = tonumber(ARGV[3]) / 1000 -- refill rate per millisecond
local bucket = redis.call('HMGET', key, 'tokens', 'lastRefill')
local tokens = tonumber(bucket[1]) or capacity
local lastRefill = tonumber(bucket[2]) or now
-- Calculate how many tokens have been added since last check
local timePassed = now - lastRefill
local refillAmount = math.floor(timePassed * refillPerMs)
tokens = math.min(capacity, tokens + refillAmount)
if tokens >= 1 then
-- Consume a token
tokens = tokens - 1
redis.call('HMSET', key, 'tokens', tokens, 'lastRefill', now)
redis.call('PEXPIRE', key, 3600000) // expire in 1 hour
return {1, tokens} -- allowed, remaining tokens
else
return {0, 0} -- denied, no tokens left
end
`;
const result = await redisClient.eval(luaScript, 1, key, now, bucketCapacity, refillRate);
return { allowed: result[0] === 1, remainingTokens: result[1] };
}
What happens when a user hits their limit? Simply returning a 429 error is fine, but we can do better. Consider implementing a queue for certain critical operations, or returning a partial response with a warning. Communication is key. Always use standard headers like X-RateLimit-Remaining and Retry-After so clients can build smart retry logic.
Monitoring is the final, crucial piece. You need to know when limits are being hit and by whom. Log these events and consider setting up alerts for unusual spikes. This data can also help you adjust your limits to better match real usage patterns.
// Simple logging in the limiter
if (!allowed) {
console.warn(`Rate limit hit for user ${userId} on path ${req.path}`);
// Send to your monitoring service (e.g., Datadog, Sentry)
monitoringService.increment('rate_limit.hits', 1, { userId, path: req.path });
}
Building this changed how I see API design. It’s not just about making endpoints available; it’s about managing how they are used. It’s a commitment to reliability and fairness for every person or service that calls your code. Start with a simple fixed window, then move to a sliding window as your needs grow. Use Redis to make it work across your entire infrastructure. Remember, the goal isn’t to say “no” to requests, but to say “yes, in a way that works for everyone.”
Did you find this guide helpful? Have you encountered a rate limiting challenge that required a unique solution? Share your thoughts in the comments below—I’d love to hear about your experiences. If this article helped you build a more robust API, please consider liking and sharing it with other developers in your network.
As a best-selling author, I invite you to explore my books on Amazon. Don’t forget to follow me on Medium and show your support. Thank you! Your support means the world!
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva