0

I've seen this asked here, here, and here - but without any good answers and was hoping to maybe get some closure on the issue.

I have an ELB connected to 6 instances all running Tomcat7. Up until Friday there were seemingly no issues at all. However, starting about five days ago we started getting around two 504 GATEWAY_TIMEOUT from the ELB per day. That's typically 2/2000 ~ .1%. I turned on logging and see

2018-06-27T12:56:08.110331Z momt-default-elb-prod 10.196.162.218:60132 - -1 -1 -1 504 0 140 0 "POST https://prod-elb.us-east-1.backend.net:443/mobile/user/v1.0/ HTTP/1.1" "BackendClass" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2

But my Tomcat7 logs don't have any 504s present at all, implying that the ELB is rejecting these requests without even communicating with the Tomcat.

I've seen people mention setting the Tomcats timeout to be greater than the ELB's timeout - but if that were what were happening (i.e. Tomcat times out and then ELB shuts down), then shouldn't I see a 504 in the Tomcat logs?

Similarly, nothing has changed in the code in a few months. So, this all just started seemingly out of nowhere, and is too uncommon to be a bigger issue. I checked to see if there were some pattern in the timeouts (i.e. tomcat restarting or same instance etc.) but couldn't find anything.

I know other people have run into this issue, but any and all help would be greatly appreciated.

LivingRobot
  • 883
  • 2
  • 18
  • 34
  • To help understand and grab more data you can enable VPC flow logs and ELB access logs. And then create cloud watch alerts based on the VPC flow logs. https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/flow-logs.html#working-with-flow-logs https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/flow-logs.html https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/access-log-collection.html – strongjz Jun 27 '18 at 19:31
  • I would also monitor the Latency after you enable the ELB logs. https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/ts-elb-error-message.html#ts-elb-errorcodes-http504 – strongjz Jun 27 '18 at 19:39
  • I enabled the ELB logs before (that's where the posted response comes from), and the weird thing is that the timeouts all come when the latency is much lower. The average latency is about 1100 ms, but for requests that receive a 504 the latency is 500 ms. So, it seems to immediately timeout. – LivingRobot Jun 27 '18 at 21:25
  • Have you walked this troubleshooting doc? https://aws.amazon.com/premiumsupport/knowledge-center/elb-latency-troubleshooting/ – strongjz Jun 28 '18 at 19:47

0 Answers0