20

I have a Flask app running under Gunicorn, using the sync worker type with 20 worker processes. The app reads a lot of data on startup, which takes time and uses memory. Worse, each process loads its own copy, which causes it to take even longer and take 20X the memory. The data is static and doesn't change. I'd like to load it once and have all 20 workers share it.

If I use the preload_app setting, it only loads in one thread, and initially only takes 1X memory, but then seems to baloon to 20X once requests start coming in. I need fast random access to the data, so I'd rather not do IPC.

Is there any way to share static data among Gunicorn processes?

Community
  • 1
  • 1
Doctor J
  • 5,974
  • 5
  • 44
  • 40

2 Answers2

7

Memory mapped files will allow you to share pages between processes.

https://docs.python.org/3/library/mmap.html

Note that memory consumption statistics are usually misleading and unhelpful. It is usually better to consider the output of vmstat and see if you are swapping a lot.

Jeril
  • 7,858
  • 3
  • 52
  • 69
aaa90210
  • 11,295
  • 13
  • 51
  • 88
  • 1
    I guess should have said I wanted to share a normal Python dict, not just a blob of memory. – Doctor J Nov 11 '14 at 00:30
  • 1
    @DoctorJ Then your out of luck. The reason is a Python data structure is just pointers to pointers in memory, which will span many pages. Most of those pages will also be shared with data that is written to, hence copy-on-write causes the pages to be duplicated in each process. I would recommend using an object store or key-value store like Redis - this is the "standard" solution to this problem today. IPC on localhost is very fast, you are probably optimizing prematurely if you think it will be the bottleneck. – aaa90210 Nov 11 '14 at 00:39
2

Assuming your priority is to keep the data as a Python data structure instead of moving it to a database such as Redis, then you'll have to change things so that you can use a single process for your server.

Gunicorn can work with gevent to create a server that can support multiple clients within a single worker process using coroutines, that could be a good option for your needs.

Miguel Grinberg
  • 65,299
  • 14
  • 133
  • 152
  • Can you explain this more? It sounds potentially valuable, but I'm not sure how it would work. – Eli Jul 29 '15 at 06:25
  • There isn't really much of a difference, Flask abstracts you from the details. when you use gevent you just need to make sure your view functions do what they need to do without taking large amounts of CPU time (or yielding if they are lengthy tasks). Multi-tasking is achieved by putting each request in a coroutine, but this is all handled by the framework. – Miguel Grinberg Jul 29 '15 at 06:43
  • I mean more around getting code to run in only the parent process that generates the data and keeps it updated. Very similar to this question: http://stackoverflow.com/questions/13768894/run-startup-code-in-the-parent-with-django-and-gunicorn – Eli Jul 29 '15 at 19:47
  • 4
    When you use gevent, eventlet or other coroutine frameworks there is a single process, there are no parent and children. At the start of the process you can load or generate any data you need and put it in the global scope, and that will be accessible to all your handlers, because they all run in the same process. Since the data is static (at least in this question it is) you don't even have to worry about locking. – Miguel Grinberg Jul 29 '15 at 21:44
  • @Miguel is there any performance difference btw gevent process and the other choices like tornado process in gunicorn – erogol Jan 30 '17 at 15:53
  • If we use single worker when we have more than one CPUs. The other CPUs will be on idle state, so multiple workers with gevent would be good option to choose. – RaiBnod May 08 '21 at 05:54