One of my celery tasks requires access to a very large data structure. This model is taxing on memory once loaded up and takes a very long time to do so.
I am trying to avoid loading this multiple times, every time I call "score_model". Preferably, I'd like to load this data into memory once at initialisation so that it is available only to this task when required.
(Usually pickle works great however this model does not serialise easily and so I cannot set it within cache. It would require a complex custom backend and I'm not convinced it would result in significant time savings.)
The celery docs suggest I should create a base class. This SO answer was also very helpful Initializing a worker with arguments using Celery
The answer hints at having objects made available upon initialisation for use later and works well for centralising common functions, keeping sessions open when used across workers. The following code works:
tasks.py
from celery import Task
from celery.decorators import task
class startup(Task):
abstract = True
def __init__(self):
print "Loading..."
self.score_model= load_and_bin('/opt/model.tar')
@task(base=startup, bind=True, name="scoring")
def scoring(self, pk):
data = entries.objects.filter(geo_pk=pk)
newscores = score_model(data)
...however this loads my model into memory multiple times upon initialisation - even starting up celery beat (which does not run this task) or even flower. I can see that little print more than once and I'm not sure why. So I quickly run out of resources. Registering this class, by removing abstract=True, still works but I run out of resources more quickly.
How can I run load_and_bin once, and have this object available later on? I start celery with: celery -A app worker