1

I was working on deploying my web application via Google App Engine when I encountered a 502 Bad Gateway Error(Nginx). After running gcloud app logs read, I tracked the error down to be:

2020-05-12 00:15:59 default[20200511t163633] "GET /input/summary" 200

2020-05-12 00:16:38 default[20200511t163633] [2020-05-12 00:16:38 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:9)

2020-05-12 00:16:38 default[20200511t163633] [2020-05-12 00:16:38 +0000] [9] [INFO] Worker exiting (pid: 9)

2020-05-12 00:16:38 default[20200511t163633] [2020-05-12 00:16:38 +0000] [15] [INFO] Booting worker with pid: 15

2020-05-12 00:16:38 default[20200511t163633] "POST /input/summary" 502

For those wondering, my app.yaml looks like this:

    runtime: custom
    env: flex
    
    runtime_config:
      python_version: 3
    
    resources:
      cpu: 4
      memory_gb: 16
      disk_size_gb: 25
    
    readiness_check:
      app_start_timeout_sec: 900

My Dockerfile looks like this:

FROM gcr.io/google-appengine/python

RUN virtualenv /env -p python3.7

ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH

ADD requirements.txt /app/requirements.txt
RUN pip3 install -r /app/requirements.txt

ADD . /app

RUN apt-get update \
    && apt-get install tesseract-ocr -y

EXPOSE 8080
ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:8080", "main:app"]

I am running the app through:

    if __name__ == '__main__':
        app.run(debug=True, host='0.0.0.0', port=8080)

Everything seems to work fine on localhost but the problems arise when I deploy to Google App Engine. Does anyone know what may be the root of the issue? Thanks in Advance!

Community
  • 1
  • 1
Samrat Sahoo
  • 565
  • 8
  • 17
  • By default, Gunicorn uses sync workers and each worker can only handle one request at a time. By default, gunicorn only uses one of these workers. I would suggest to check Google Documentation with the [recommended gunicorn configuration](https://cloud.google.com/appengine/docs/flexible/python/runtime#recommended_gunicorn_configuration). Let me know if this works for you. – tzovourn May 12 '20 at 11:22

1 Answers1

0

Is it failing to deploy? Or does the deploy succeed, but the server fails to run?

If its failing to deploy, you may be hitting the 10 minute timeout that deploy jobs have to finish. You can increase this number by setting your local gcloud config.

gcloud config set app/cloud_build_timeout 1200s

Alex
  • 5,141
  • 12
  • 26
  • It deploys fine. Its specific functionality within the web app that causes it to fail – Samrat Sahoo May 11 '20 at 17:47
  • oh i see. these people seem to mitigate the issue by bumping up the timeout of gunicorn, passing it the arg `--timeout 90` https://stackoverflow.com/questions/10855197/gunicorn-worker-timeout-error another suggestion from that thread is passing gunicorn `--log-level=DEBUG` to try to get a stack trace. I noticed you set `app_start_timeout_sec: 900` What prompted that? – Alex May 11 '20 at 18:01
  • I am not sure about the `app_start_timeout_sec: 900`; it was used to fix some deployment error I was having. I just remember it had to be in between 300 and 1800. For the `--timeout 90` would I add an entrypoint in the app.yaml or would it go in the Dockerfile? If in the Dockerfile, how exactly would I format it (I am new to Dockerfiles)? – Samrat Sahoo May 11 '20 at 18:11
  • it looks like you add it into this line `ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:8080", "main:app", "--timeout=90", "--log-level=DEBUG"]` – Alex May 11 '20 at 20:01
  • or maybe like this (i'm not super familiar with dockerfiles either) `ENTRYPOINT ["gunicorn", "--bind=0.0.0.0:8080", "main:app", "--timeout 90", "--log-level debug"]` – Alex May 11 '20 at 20:04
  • Unfortunately, the problem continues to persist even with these changes. I am starting to think this is a problem with the RAM usage because the 2 features this error occurs on are the highest RAM-intensive processes. Any thoughts? – Samrat Sahoo May 12 '20 at 00:21
  • `because the 2 features this error occurs on` wait are there more logs? What are the cases where this error happens vs when it doesnt? If the problem was RAM, you should've seen an out-of-memory type of error instead of `WORKER TIMEOUT`. What is the timeout, how many seconds? and what action causes this timeout? – Alex May 12 '20 at 00:32
  • the logs for the 2 features say exactly the same thing aside from the GET and POST logs (first and last logs) but that's to be expected. The actual error itself is the same. For the timeout, is there somewhere we can find it? Unless you mean what I set it to in the dockerfile which is 90 seconds. The action that causes the timeout in this case is when a button is clicked to do some OCR on an image. – Samrat Sahoo May 12 '20 at 00:47
  • the time from when your service receives the request from your button click, to when `WORKER TIMEOUT` gets printed. If you put a log statement right at the beginning of your request handler you can just do the math on the timestamps of the log statements – Alex May 12 '20 at 00:50
  • It seems to timeout after about 39 seconds – Samrat Sahoo May 12 '20 at 00:59