Token Bucket Rate limiter using Spring Boot and Redis, bucket4j

Preetham Umarani
9 min readJun 29, 2022


In this tutorial we will learn how to implement rate limiting in a scaled service.
We will use the Bucket4J library to implement it and we will use Redis as a distributed cache.

Why Use Rate Limiting?

Let’s get started with some basics to make sure we understand the need for rate limiting and introduce the tools we’ll be using in this tutorial.

Problem with Unlimited Rates

If a public API like the Twitter API allowed its users to make an unlimited number of requests per hour, it could lead to:

  • resource exhaustion
  • decreasing quality of the service
  • denial of service attacks

This might result in a situation where the service is unavailable or slow. It could also lead to more unexpected costs being incurred by the service.

How Rate Limiting Helps

Firstly, rate-limiting can prevent denial of service attacks. When coupled with a deduplication mechanism or API keys, rate limiting can also help prevent distributed denial of service attacks.

Secondly, it helps in estimating traffic. This is very important for public APIs. This can also be coupled with automated scripts to monitor and scale the service.

And thirdly, you can use it to implement tier-based pricing. This type of pricing model means that users can pay for a higher rate of requests. The Twitter API is an example of this.

The Token Bucket Algorithm

Token Bucket is an algorithm that you can use to implement rate limiting. In short, it works as follows:

  1. A bucket is created with a certain capacity (number of tokens).
  2. When a request comes in, the bucket is checked. If there is enough capacity, the request is allowed to proceed. Otherwise, the request is denied.
  3. When a request is allowed, the capacity is reduced.
  4. After a certain amount of time, the capacity is replenished.

How to Implement Token Bucket in a Distributed System

To implement the token bucket algorithm in a distributed system, we need to use a distributed cache.

The cache is a key-value store to store the bucket information. We will use a Redis cache to implement this.

Internally, Bucket4j allows us to plug in any implementation of the Java JCache API. The Redisson client of Redis is the implementation we will use.

Project Implementation

We will use the Spring Boot framework to build our service.

Our service will contain the below components:

  1. A simple REST API.
  2. A Redis cache connected to the service — using the Redisson client.
  3. The Bucket4J library wrapped around the REST API.
  4. We’ll connect Bucket4J to the JCache interface which will use the Redisson client as the implementation in the background.

First, we will learn to rate limit the API for all requests. Then we will learn to implement a more complex rate limiting mechanism per user or per pricing tier.

Let’s start with the project setup.

Install Dependencies

Let’s add the below dependencies to our pom.xml (or build.gradle) file.

<!-- To build the Rest API -->

<!-- Redisson Starter = Spring Data Redis starter(excluding other clients) and Redisson client -->

<!-- Bucket4J starter = Bucket4J + JCache -->

Cache Configuration

Firstly, we need to start our Redis server. Let’s say we have a Redis server running on port 6379 on our local machine.

We need to perform two steps:

  1. Create a connection to this server from our application.
  2. Set up JCache to use the Redisson client as the implementation.

Redisson’s documentation provides concise steps to implement this in a regular Java application. We’re going to implement the same steps, but in Spring Boot.

Let’s look at the code first. We need to create a Configuration class to create the required beans.

public class RedisConfig {

public Config config() {
Config config = new Config();
return config;

public CacheManager cacheManager(Config config) {
CacheManager manager = Caching.getCachingProvider().getCacheManager();
cacheManager.createCache("cache", RedissonConfiguration.fromConfig(config));
return cacheManager;
ProxyManager<String> proxyManager(CacheManager cacheManager) {
return new JCacheProxyManager<>(cacheManager.getCache("cache"));

What does this do?

  1. Creates a configuration object that we can use to create a connection.
  2. Creates a cache manager using the configuration object. This will internally create a connection to the Redis instance and create a hash called “cache” on it.
  3. Creates a proxy manager that will be used to access the cache. Whatever our application tries to cache using the JCache API, it will be cached on the Redis instance inside the hash named “cache”.

Build the API

Let’s create a simple REST API.

public class RateLimitController {
public String getInfo(@PathVariable("id") String id) {
return "Hello " + id;

If I hit the API with the URL http://localhost:8080/user/1, I will get the response Hello 1.

Bucket4J Configuration

To implement the rate limiting, we need to configure Bucket4J. Thankfully, we do not need to write any boilerplate code due to the starter library.

It also automatically detects the ProxyManager bean we created in the previous step and uses it to cache the buckets.

What we do need to do is configure this library around the API we created.
Again there are multiple ways to do this.

We can go for property-based configuration which is defined in the starter library.
This is the most convenient way for simple cases like rate-limiting for all users or all guest users.

However, if we want to implement something more complex like a rate limit for each user, it’s better to write custom code for it.

We are going to implement rate limiting per user. Let’s assume we have the rate limit for each user stored in a database, and we can query it using the user id.

Let’s write the code for it step by step.

Create a Bucket

Before we start, let’s look at how a bucket is created.

Refill refill = Refill.intervally(10, Duration.ofMinutes(1));
Bandwidth limit = Bandwidth.classic(10, refill);
Bucket bucket = Bucket4j.builder()
  • Refill — After how much time the bucket will be refilled.
  • Bandwidth — How much bandwidth the bucket has. Basically, requests per refill period.
  • Bucket — An object configured using these two parameters. Additionally, it maintains a token counter to keep track of how many tokens are available in the bucket.

Using this as the building block, let’s change a few things to make it suitable to our use case.

Create and Cache Buckets using ProxyManager

We created the proxy manager for the purpose of storing buckets on Redis. Once a bucket is created, it needs to be cached on Redis and does not need to be created again.

To make this happen, we will replace the Bucket4j.builder() with proxyManager.builder(). ProxyManager will take care of caching the buckets and not creating them again.

ProxyManager’s builder takes two parameters — a key against which the bucket will be cached and a configuration object that it will use to create the bucket.

Let’s see how we can implement it:

public class RateLimiter {
//autowiring dependencies

public Bucket resolveBucket(String key) {
Supplier<BucketConfiguration> configSupplier = getConfigSupplierForUser(key);

// Does not always create a new bucket, but instead returns the existing one if it exists.
return buckets.builder().build(key, configSupplier);
private Supplier<BucketConfiguration> getConfigSupplierForUser(String key) {

Refill refill = Refill.intervally(20, Duration.ofMinutes(1));
Bandwidth limit = Bandwidth.classic(20, refill);
return () -> (BucketConfiguration.builder()

We have created a method which returns a bucket for a key provided. In the next step, we will see how to use this.

How to Consume Tokens and Set Up Rate Limiting

When a request comes in, we will try to consume a token from the relevant bucket.
We will use the tryConsume() method of the bucket to do this.

public String getInfo(@PathVariable("id") String id) {
// gets the bucket for the user
Bucket bucket = rateLimiter.resolveBucket(id);

// tries to consume a token from the bucket
if (bucket.tryConsume(1)) {
return "Hello " + id;
} else {
return "Rate limit exceeded";

The tryConsume() method returns true if the token was consumed successfully or false if the token was not consumed.

Test it with:

Use Linux watch command to test it out:

watch -n 0.5 curl — location — request GET ‘http://localhost:8085/user/1'

Github Link: link

Caveats of this approach:

The canonical algorithm for rate limiting with a rolling window is a token bucket (or its inverse sibling, the leaky bucket). Here’s how it works:

  • Each user has a bucket associated with them, containing a number of tokens.
  • When a user attempts to taken an action, we check the number of tokens in the bucket.
  • If the bucket is empty, the user has exceeded the rate, and the action is blocked.
  • Otherwise, we remove a token from the bucket (or more, for “expensive” actions) and allow the action.
  • Over time, we refill all buckets at a set rate, up to some maximum capacity.

This is a clever algorithm that takes very little space (only a single integer counter per user), but the naive implementation has a big flaw — some process needs to keep refilling the buckets. With several million users, and each refill operation requiring a write, this would be an unsustainable load on our Redis instance. Here’s a more sophisticated approach:

  • Each user has a two keys associated with them: the token bucket, and a timestamp of the last time the bucket was refilled.
  • When a user attempts to perform an action, we fetch the stored timestamp.
  • We calculate how many tokens the user should have been granted since that last request.
  • We can then proceed with the algorithm, using this new token count.

Unfortunately, this algorithm also has a problem: it fails when two processes need to share the rate limiter. Redis can batch operations into one atomic action, but to calculate how many tokens we need to give the user, we need at least two trips to redis: one to get the last timestamp, and one to set the new token count. Without delving into Redis Lua scripting (something none of us had experience with, and that would be hard to mock out in tests), we couldn’t find a way to combine all needed operations into a single atomic set of Redis actions. Because of this, if two clients using the rate limiter both tried to verify a user action at the same time, we could have the following sequence of operations:

  • The user has enough tokens left for one action.
  • Not enough time has passed since the last action for more tokens to be granted.
  • Client 1 gets the stored timestamp and token count.
  • Client 2 gets the stored timestamp and token count.
  • Client 1 calculates that there is no need to add tokens, allows the action, and tells redis to set the token count to 0.
  • Client 2 also calculates that there is no need to add tokens, allows the action, and tells redis to set the token count to 0.

Needless to say, this is not ideal. If we have several dozen workers all processing push notification payloads, it would be possible for a user to be spammed with dozens of pushes all at once.

A better approach — sorted sets

Fortunately, Redis has another data structure that we can use to prevent these race conditions — the sorted set. Here’s the algorithm we came up with:

  • Each user has a sorted set associated with them. The keys and values are identical, and equal to the (microsecond) times when actions were attempted.
  • When a user attempts to perform an action, we first drop all elements of the set which occured before one interval ago. This can be accomplished with Redis’s ZREMRANGEBYSCORE command.
  • We fetch all elements of the set, using ZRANGE(0, -1).
  • We add the current timestamp to the set, using ZADD.
  • We set a TTL equal to the rate-limiting interval on the set (to save space).
  • After all operations are completed, we count the number of fetched elements. If it exceeds the limit, we don’t allow the action.
  • We also can compare the largest fetched element to the current timestamp. If they’re too close, we also don’t allow the action.

The advantage of this approach is that all Redis operations can be performed as an atomic action, using the MULTI command. This means that if two processes both try to perform an action for the same user, there’s no way for them to not have the latest information, preventing the problem outlined above. It also allows us to use one limiter for both rates we wanted to track (i.e. no more than 10 messages per minute or 2 per 3 seconds).

However, there is one caveat to this approach — one we were comfortable with but may not fit others’ requirements. In our algorithm, you can see that whether or not an action is blocked is determined after all Redis operations are completed. This means that blocked actions still count as actions. So, if a user continually exceeds the rate limit, none of their actions will be allowed (after the first few), instead of allowing occasional actions through.

Java example for this is coming soon.

Clap if you liked this article. Until next time BBYE!



Preetham Umarani