55

I have a large read-only data structure (a graph loaded in networkx, though this shouldn't be important) that I use in my web service. The webservice is built in Flask and then served through Gunicorn. Turns out that for every gunicorn worker I spin up, that worked holds its own copy of my data-structure. Thus, my ~700mb data structure which is perfectly manageable with one worker turns into a pretty big memory hog when I have 8 of them running. Is there any way I can share this data structure between gunicorn processes so I don't have to waste so much memory?

Eli
  • 36,793
  • 40
  • 144
  • 207
  • 1
    Have you considered using something like Redis to store the data and access it from each process? Would be very similar to shared memory as far as speed goes. – nathancahill Dec 02 '14 at 01:45
  • I would, but we're talking about a complex graph that there's no easy way to store in Redis (Redis has no directed edge graphs or general graph support currently AFAIK). – Eli Dec 02 '14 at 01:55
  • 2
    Did the solution work for you? If yes can you le me know in detail, how you did it? – neel Mar 11 '16 at 06:28

1 Answers1

28

It looks like the easiest way to do this is to tell gunicorn to preload your application using the preload_app option. This assumes that you can load the data structure as a module-level variable:

from flask import Flask
from your.application import CustomDataStructure

CUSTOM_DATA_STRUCTURE = CustomDataStructure('/data/lives/here')

# @app.routes, etc.

Alternatively, you could use a memory-mapped file (if you can wrap the shared memory with your custom data structure), gevent with gunicorn to ensure that you're only using one process, or the multi-processing module to spin up your own data-structure server which you connect to using IPC.

Community
  • 1
  • 1
Sean Vieira
  • 155,703
  • 32
  • 311
  • 293
  • 1
    preload option is not working, can you provide some example of how to use it with some dummy data structure? – neel Mar 10 '16 at 07:02
  • @neel - you're probably better off asking another question with an example of your setup and what's not working. – Sean Vieira Mar 10 '16 at 15:39
  • 1
    I have posted the question here http://stackoverflow.com/questions/35914587/how-to-get-a-concurreny-of-1000-requests-with-flask-and-gunicorn It would be great if you look at it once. Thanks in advance. – neel Mar 10 '16 at 15:44
  • A great read, although didn't help me setup catch the parent process while using a Uvicorn worker, but I managed to stumble upon a solution that I think is even cleaner than the preload method, and it's using a python config file for gunicorn. `-c gconfig.py` – aliqandil Dec 20 '20 at 06:20