I've seen this asked here, here, and here - but without any good answers and was hoping to maybe get some closure on the issue.
I have an ELB connected to 6 instances all running Tomcat7. Up until Friday there were seemingly no issues at all. However, starting about five days ago we started getting around two 504 GATEWAY_TIMEOUT from the ELB per day. That's typically 2/2000 ~ .1%. I turned on logging and see
2018-06-27T12:56:08.110331Z momt-default-elb-prod 10.196.162.218:60132 - -1 -1 -1 504 0 140 0 "POST https://prod-elb.us-east-1.backend.net:443/mobile/user/v1.0/ HTTP/1.1" "BackendClass" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2
But my Tomcat7 logs don't have any 504s present at all, implying that the ELB is rejecting these requests without even communicating with the Tomcat.
I've seen people mention setting the Tomcats timeout to be greater than the ELB's timeout - but if that were what were happening (i.e. Tomcat times out and then ELB shuts down), then shouldn't I see a 504 in the Tomcat logs?
Similarly, nothing has changed in the code in a few months. So, this all just started seemingly out of nowhere, and is too uncommon to be a bigger issue. I checked to see if there were some pattern in the timeouts (i.e. tomcat restarting or same instance etc.) but couldn't find anything.
I know other people have run into this issue, but any and all help would be greatly appreciated.