2

I have an nginx configuration that redirects to a Django rest service (Through gunicorn).

Everything works correctly, but when the response is too big (takes more than 30s to respond) I'm getting a 503 service unavailable error. I am sure it is because of this issue because it works correctly on other requests, and only on specific requests where the response is too big (and fetching the request from a third party api) takes too long.

Below is my nginx configuration :

server {
listen       www.server.com:80;
server_name www.server.com;

client_max_body_size 200M;
 keepalive_timeout 300;

location /server/ {
    proxy_pass http://127.0.0.1:8000/;
    proxy_connect_timeout 120s;
    proxy_read_timeout 300s;
    client_max_body_size 200M;
}

 location / {
    root   /var/www/html;
    index  index.html index.htm;
}
}

I am sure the issue is from Nginx and not gunicorn, because if i do a curl from inside the machine i get a response.

Thanks,

Dany Y
  • 6,833
  • 6
  • 46
  • 83
  • Please post the output of your [NGINX error log](https://stackoverflow.com/q/1706111/2532070). Also post the output (if any) in `/var/log/syslog` for one of these errors and the error log for your Gunicorn application. What is the server setup and what is the browser you're running (assuming this is a web request)? – YPCrumble Feb 15 '18 at 22:24
  • That's the problem there's n error on either side, and from the application it's actually returning the result (although late) – Dany Y Feb 16 '18 at 10:20
  • If the application is returning the correct result, without an error, but slowly, is the problem that the third-party API server is too slow to respond? I'm not sure what the problem to be solved is. Is there an entry in NGINX access log? You're saying that NGINX shows a 503 error in its access log but nothing in the error log? – YPCrumble Feb 16 '18 at 16:01
  • Could it be that the django application is failing? It seems the case. – mimsugara Feb 16 '18 at 16:06
  • @DanyY, see if my answer helps – Tarun Lalwani Feb 17 '18 at 05:51

2 Answers2

3

You do specify proxy_connect_timeout and proxy_read_timeout, but never proxy_send_timeout. (TBH, I don't think you need to modify timeout for connect(2), as that call simply established the TCP connection, and wouldn't depend on the size or time of an individual page; but the other two seem like a fair game.)

Additionally, as per https://stackoverflow.com/a/48614613/1122270, another consideration might be proxy_http_version — your curl is probably using HTTP/1.1, whereas nginx does HTTP/1.0 by default, and your backend might behave differently.

cnst
  • 25,870
  • 6
  • 90
  • 122
  • I added both the `proxy_http_version` and the `proxy_send_timeout` but i'm still facing this issue, any idea ? – Dany Y Feb 15 '18 at 16:06
  • Would probably look into the issues through [tcpdump](http://mdoc.su/o/tcpdump) to see where the error occurs. Are you doing caching of proxy_pass? Might be helpful to disable http://nginx.org/r/proxy_buffering, which could either be done for these long-standing requests within a specific location, or through an HTTP header. BTW, the whole idea of long-standing requests is often moot — why exactly does anyone need to wait over 30 seconds for your request to complete?! – cnst Feb 15 '18 at 20:52
  • I'll try tcpdump to see what's happening, for the 30 seconds it's simply because I'm using a travel booking api form the backend, and they take this long sometimes to respond, you think keepalive_timeout is any use in this case ? – Dany Y Feb 16 '18 at 10:20
  • I suspect it may have something to do with `proxy_buffering`, then. If the external API that your backend uses is so slow to confirm the reservation, it's generally been best practice to simply send a few bytes to the client until the whole page can be loaded (which would require turning off `proxy_buffering` in nginx to work properly); or, another good approach, is make the page refresh itself every 10 seconds, back to itself (just using the HTML http-equiv refresh), until the backend would have the confirmation, which is when you show the proper page instead of the refresh placeholder. – cnst Feb 16 '18 at 21:03
  • Thank you, this seems the most reliable way. when you say `best practice to simply send a few bytes to the client until the whole page can be loaded` you mean manually store the data in the database and make another call to fetch them back ? – Dany Y Feb 18 '18 at 15:38
  • For the refresh suggestion, yes, just store it yourself before another repeat request comes in; for the keepalive approach, just send some small pieces of noop data to the client to keep the connection active (e.g., HTML comments etc). In the old days, that's how the chat would be implemented — an HTML page that loads *very slowly*, and, basically, never completes. – cnst Feb 18 '18 at 22:36
1

When you run below

$ gunicorn --help | grep -A2 -i time
  --graceful-timeout INT
                        Timeout for graceful workers restart. [30]
  --do-handshake-on-connect
                        Whether to perform SSL handshake on socket connect
--
  -t INT, --timeout INT
                        Workers silent for more than this many seconds are
                        killed and restarted. [30]

So I would assume the timeout happens from gunicorn and not through nginx. So You don't just need timeout increase on nginx side but also on gunicorn

You can either add

timeout=180

to your config.py file or you can add it to the command line when launching gunicorn

gunicorn -t 180 ......
Tarun Lalwani
  • 142,312
  • 9
  • 204
  • 265