I am using Flask served with gunicorn, my configuration file looks something like:
timeout = 30
limit_request_line = 6000
max_requests = 500 # restart worker after this many requests
max_requests_jitter = 100
preload_app = True
workers = 2
With two sync workers and preforking I expect most of my application code to be loaded in the parent process before forking. However I've noticed after I do the first couple requests, the memory usage of the two worker processes jumps hugely.
two workers using a bunch of ram after first requests come in
I've tried to find anything I can that would be being loaded at "runtime" so to speak rather than at flask application setup time and I haven't been able to find anything. I've tried using memory_profiler extensively and not come up with any useful data yet. I even tried making sure my app and models and views are for sure loaded before forking:
def pre_fork(server, worker):
print(f"PRE-FORK {server} {worker}")
import myapp.views
While I can see this is called and loading all of my views (which should load basically everything as far as the app is concerned), it makes no difference. What can I do to figure out what is causing the massive memory consumption on the first request in the worker processes? The memory usage is very stable and flat after the first requests, so I don't think there is an ongoing leak. I just want to find out what is not being loaded before the fork and not shared.
My main question is: what extra work is Flask doing when the first request comes in for a worker that could be loading extra bits into memory?