Caching in Celery Using LRUCache

Caching large data in celery across multiple workers with LRUCache in flask.

When you have data intensive or time taking tasks in your application a good idea would be to run those tasks in background if feasible so that user experience won’t be compromised. Celery will help you in such situations to run tasks in the background.

But what happens if you need to cache a large resource in memory for multiple celery tasks ? Well one answer for that would be using the LRUCache of celery to fulfill this purpose. Other caching options would include but not limited to

  • redis
  • memcache

LRUCache implementation with celery in flask framework is explained below.

What is Celery ?

Celery is a task queue which can be used in your python web application for running time consuming tasks outside the http request response cycle. Celery can be used mainly in two situations

  • Asynchronous task processing : Consider a situation where you have to send emails to a lot of users in your application when an http request hits the server. Running them using celery as an asynchronous task will increase the efficiency of your application.
  • Scheduled job processing : This facility gives you the option to run tasks periodically like running some complex installation tasks in the midnight.

Setting up Celery with Flask

If you are familiar with setting up celery with flask then you can skip this portion. Detailed explanation is not given here as the purpose of the blog is to explain caching in celery.

The simple structure of the flask application would be

First let us add application specific and celery specific configurations. In config.py write

Now our celery specific configurations (celery_config.py) will look like this.

Now we will write code for initializing the app in app/__init__.py.

Our manage.py which contain code to run the app is

Now we will add our celery tasks in tasks.py

And finally we will add the run.py which will communicate with the http requests.

Adding Celery LRUCache

In LRUCache scheme the least recently used cache will be removed when the cache limit exceeds. To add caching in celery first we need to add LRUCache in config.py which can be referenced later.

# Importing LRUcache from celery                       
from celery.utils.functional import LRUCache
class Config:
...
RESOURCE_CACHE = LRUCache(limit=10)
# Setting the cache with a key limit of 10

Now in out app/__init__.py we need to add a custom task class which will act as base for both of our tasks. Detailed explanation about this can be found here.

Finally add the ResourceTask class as base task for your sub tasks. Change in app/tasks.py. This will make the LRUCache common for both tasks as the cache is initialized in the init function of ResourceTask. Here the shared_resource_data which is a property of the base class is used by the child tasks which in turn returns the cached data.

Please make sure to install the dependencies and configure broker (here redis). Now run both app and celery with commands python manage.py and celery -A manage.celery worker --loglevel=INFO

Now in the browser type http://127.0.0.1:5002/task1 and http://127.0.0.1:5002/task2 to run both the tasks. You can see that same resource is loaded by both the tasks.

The complete code for this can be found here.

Conclusion

There are many caching options available in a situation. So before going for celery look at the other other alternatives too as celery is not always the right choice.

Software engineer || Python programmer || Full stack developer