I have a Rails app that is uploading data to a Node.js endpoint. The endpoint works for smaller data sets, but at a certain size begins to time out consistently with a 504 error. After the 504 error appears, the Nodejs logs show the endpoint getting hit and a few minutes later I see the image created from the uploaded data appear in S3. I know it sounds off the bat like I should just upload the data to S3 from the Rails app, but let's just take as a given that's a no go.
Looking at the Nginx access logs for the Node.js application I see a 499 being returned for the endpoint.
The 499 means the client closed the connection based on this https://httpstatuses.com/499
This person had a similar issue: NginX issues HTTP 499 error after 60 seconds despite config. (PHP and AWS)
However, I've already implemented their solution (increase the idle timeout for the associated ELB on AWS). No dice, despite confirmation here (How to figure out Nginx status code 499) that the ELB is the likely 'client' closing the connection to the Node.js application.
The ELB logs show a 504 error with the following values
request_processing_time = -1
backend_processing_time = -1
response_processing_time = -1
elb_status_code = 504
backend_status_code = 0
received_bytes = 2167746
sent_bytes = 0
I'm not 100% sure how to interpret this as clearly it received all the bytes and sent them on to the Node.js server (or else my image would never have been created). I'm just assuming sent_bytes refers to bytes sent as a response to the request. In which case, 0 makes perfect sense since it timed out.
This post outlined some potential interpretations https://hardwarehacks.org/blogs/devops/2015/12/29/1451416941200.html
Based on this article, it seems that the log line I'm seeing indicates, "The application did not respond to the ELB at all, instead closing its connection when data was requested. This is a fast timeout — the 504 will typically be returned in a matter of milliseconds, well under the ELB's timeout setting."
That doesn't sound correct, since it takes about 2 minutes to timeout. Since that's the default nodejs timeout, I thought perhaps I hadn't effectively lengthened it, but it's set correctly to significantly longer for debugging purposes (10 minutes) which I confirmed with
time telnet localhost 3500
I'm using Puma as my Rails server and Puma does not include a request timeout mechanism; even if it did, there are no errors being logged. I thought maybe it was the Httparty timeout, but then I wouldn't be getting a response with a 504 instead it would be giving me a Net Timeout error.
My nginx config is based on https://www.digitalocean.com/community/tutorials/how-to-set-up-a-node-js-application-for-production-on-ubuntu-14-04
server {
listen 80;
location / {
proxy_pass http://127.0.0.1:3500;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection '';
proxy_set_header Host $host;
client_max_body_size 20m;
client_body_timeout 600s;
keepalive_timeout 600s;
send_timeout 600s;
proxy_connect_timeout 600s;
proxy_send_timeout 600s;
proxy_read_timeout 600s;
}
}
This isn't my area of expertise, and I'm out of ideas. I'd love some direction, ideas, clarifications, etc. on how to make these larger data files successfully go through without timing out. Hopefully there's something super obvious I overlooked.