Gunicorn worker terminated with signal 9

Question

I am running a Flask application and hosting it on Kubernetes from a Docker container. Gunicorn is managing workers that reply to API requests.

The following warning message is a regular occurrence, and it seems like requests are being canceled for some reason. On Kubernetes, the pod is showing no odd behavior or restarts and stays within 80% of its memory and CPU limits.

[2021-03-31 16:30:31 +0200] [1] [WARNING] Worker with pid 26 was terminated due to signal 9

How can we find out why these workers are killed?

Did you manage to find out why? Having the same issue, and tried specifying the `--shm-size` - but no avail. — lionbigcat, Jun 03 '21 at 09:56
Our problems seem to have gone away since we started using `--worker-class gevent`. I suspect Simon is right and this was either an out of memory error, or a background process running for too long and the main process (1) decided to kill it. — Jodiug, Jun 04 '21 at 07:49
Meta: I'm not sure why this question is being downvoted. Please drop a comment if you feel it needs further clarification. — Jodiug, Jun 04 '21 at 07:53
I have the same problem, and gevent did not solve it. does anyone knows why this started all of a sudden? was there a change in gunicorn or in kube? — Blop, Jun 13 '21 at 06:57
also related to a non answered question: https://stackoverflow.com/questions/57745100/gunicorn-issues-on-gcloud-memory-faults-and-restarts-thread — Blop, Jun 13 '21 at 11:53
@Blop - my issue was OOM-related. I had to use a larger instance with more RAM, and gave the docker container access to that RAM. — lionbigcat, Jun 15 '21 at 16:57
@lionbigcat ye, eventually that's exactly what I did as well. just adding another 1GB fixed the problem. no need to change to gevent. — Blop, Jun 16 '21 at 08:21
I faced the same issue and solved it by switching from python 3.8 to python 3.7 — Vincent Agnes, Aug 29 '21 at 21:42

score 67 · Accepted Answer · answered May 27 '21 at 10:01

67

I encountered the same warning message.

[WARNING] Worker with pid 71 was terminated due to signal 9

I came across this faq, which says that "A common cause of SIGKILL is when OOM killer terminates a process due to low memory condition."

I used dmesg realized that indeed it was killed because it was running out of memory.

Out of memory: Killed process 776660 (gunicorn)

answered May 27 '21 at 10:01

Simon

686
5
2

2

Our problems seem to have gone away since we started using `--worker-class gevent`. I can't verify this answer, but it seems that `dmesg` is a good way to get more information and diagnose the problem. Thanks for your answer! – Jodiug Jun 04 '21 at 07:49
I noticed this happen when I didn't provide enough memory to Docker Desktop, which was running Gunicorn workers within a container. Increasing the memory to Docker Desktop solved the problem. – phoenix Jan 03 '23 at 12:57

score 29 · Answer 2 · edited Feb 09 '23 at 14:57

29

In our case application was taking around 5-7 minutes to load ML models and dictionaries into memory. So adding timeout period of 600 seconds solved the problem for us.

gunicorn main:app \
   --workers 1 \
   --worker-class uvicorn.workers.UvicornWorker \
   --bind 0.0.0.0:8443 \
   --timeout 600

edited Feb 09 '23 at 14:57

Yoooda

31
2
7

answered Feb 12 '22 at 14:15

ACL

409
3
4

1

that was it in my case as well. many thanks for the pointer. – Martin Bucher Apr 27 '22 at 12:13
1

While this solves the immediate issue, you might want to consider using a worker queue service such as celery for long running tasks. – Code-Apprentice Feb 02 '23 at 08:22
1

And by "you" I mean future readers. – Code-Apprentice Feb 02 '23 at 08:27
1

+1 for this. In my case I was importing large CSV datasets to a database, and simply had my timeouts set too low. After trying other things like refactoring my parsers for better memory performance, it was the timeouts that helped. – Milo Persic Mar 27 '23 at 17:04

score 3 · Answer 3 · answered Feb 09 '22 at 03:28

3

I encountered the same warning message when I limit the docker's memory, use like -m 3000m.

see docker-memory

and

gunicorn-Why are Workers Silently Killed?

The simple way to avoid this is set a high memory for docker or not set.

answered Feb 09 '22 at 03:28

hstk

163
2
10

I also encountered the same error after changing HPA metrics it started working fine – Aditya Jul 25 '22 at 08:29

score 2 · Answer 4 · answered Apr 20 '22 at 18:25

2

I was using AWS Beanstalk to deploy my flask application and I had a similar error.

In the log I saw:
web: MemoryError
[CRITICAL] WORKER TIMEOUT
[WARNING] Worker with pid XXXXX was terminated due to signal 9

I was using the t2.micro instance and when I changed it to t2.medium my app worked fine. In addition to this I changed to the timeout in my nginx config file.

answered Apr 20 '22 at 18:25

Vkey

41
5

Mind sharing the timeout variable name? – Snehangsu Jun 16 '22 at 15:28
Below is the contents on my timeout.conf file under the nginx>conf.d folder keepalive_timeout 600s; proxy_connect_timeout 600s; proxy_send_timeout 600s; proxy_read_timeout 600s; fastcgi_send_timeout 600s; fastcgi_read_timeout 600s; client_max_body_size 20M; – Vkey Jun 20 '22 at 10:47

score 1 · Answer 5 · answered Oct 10 '22 at 16:08

It may be that your liveness check in kubernetes is killing your workers.

If your liveness check is configured as an http request to an endpoint in your service, your main request may block the health check request, and the worker gets killed by your platform because the platform thinks that the worker is unresponsive.

That was my case. I have a gunicorn app with a single uvicorn worker, which only handles one request at a time. It worked fine locally but would have the worker sporadically killed when deployed to kubernetes. It would only happen during a call that takes about 25 seconds, and not every time.

It turned out that my liveness check was configuredto hit the /health route every 10 seconds, time out in 1 second, and retry 3 times. So this call would time out some times but not always.

If this is your case, a possible solution is to reconfigure your liveness check (or whatever health check mechanism your platform uses) so it can wait until your typical request finishes. Or allow for more threads - something that makes sure that the health check is not blocked for long enough to trigger worker kill.

You can see that adding more workers may help with (or hide) the problem.

Also, see this reply to a similar question: https://stackoverflow.com/a/73993486/2363627

score 1 · Answer 6 · edited Feb 08 '23 at 21:49

I encountered the same problem too. and it was because docker memory usage was limited to 2GB. If you are using docker desktop you just need to go to resources and increase the memory docker dedicated portion (if not you need to find the docker command line to do that).

If that doesn't solve the problem, then it might be the timeout that kill the worker, you will need to add timeout arg to the gunicorn command:

CMD ["gunicorn","--workers", "3", "--timeout", "1000", "--bind", "0.0.0.0:8000", "wsgi:app"]

afifabroory · Answer 7 · 2023-03-26T03:56:37.527

1

In my case. I need to connect to a remote databse on private network that requires me to connect to a VPN first, and I forgot that.

So, check your database connection or anything that cause your app waiting for a long time.

edited Mar 26 '23 at 03:56

answered Mar 26 '23 at 03:54

afifabroory

11
1
4

Please phrase this as an explained conditional answer, in order to avoid the impression of asking a clarification question instead of answering (for which a comment should be used instead of an answer, compare https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead ). For example like "If your problem is ... then the solution is to .... because .... ." – Yunnosch Mar 26 '23 at 04:05
This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/34105903) – Mar 30 '23 at 02:55

Mithsew · Answer 8 · 2023-08-06T20:42:42.380

1

In my case, I first noticed that decreasing the number of workers from 4 to 2 worked. However, I believe that the problem is related to the connection to the db, I tried with -w4 but I restarted my server that contains the db and it worked perfectly.

edited Aug 06 '23 at 20:42

answered Mar 31 '23 at 19:42

Mithsew

1,129
8
20

score 0 · Answer 9 · answered Dec 13 '21 at 15:29

0

In my case the problem was in long application startup caused by ml model warm-up (over 3s)

answered Dec 13 '21 at 15:29

EgurnovD

165
1
4

how did you fix it ? – Areza Dec 28 '21 at 00:36
Got rid of warm-up. Looking for ways to do it right after app start now. – EgurnovD Dec 29 '21 at 14:26

score 0 · Answer 10 · answered Feb 08 '23 at 07:05

0

Check memory usage

In my case, I can not use dmesg command. so I check memory usage as docker command:

sudo docker stats <container-id>

CONTAINER ID   NAME               CPU %     MEM USAGE / LIMIT   MEM %     NET I/O        BLOCK I/O         PIDS
289e1ad7bd1d   funny_sutherland   0.01%     169MiB / 1.908GiB   8.65%     151kB / 96kB   8.23MB / 21.5kB   5

In my case, terminating workers are not caused by memory.

answered Feb 08 '23 at 07:05

Yoooda

31
2
7

Hey. Did you find anything else than memory that could kill your workers ? – Sami Boudoukha May 23 '23 at 17:30
@SamiBoudoukha actually my case was not because memory issue. I use Django and it failed connect with database internally with no failure log. nothing else – Yoooda May 24 '23 at 00:20

Gunicorn worker terminated with signal 9

10 Answers10

Check memory usage

Linked