8

I have microservices written in node/express hosted on EC2 with an application load balancer.

Some users are getting a 502 even before the request reaches the server.

I register every log inside each instance, and I don't have the logs of those requests, I have the request immediately before the 502, and the requests right after the 502, that's why I am assuming that the request never reaches the servers. Most users solve this by refreshing the page or using an anonymous tab, which makes the connection to a different machine (we have 6).

I can tell from the load balancer logs that the load balancer responds almost immediately to the request with 502. I guess that this could be a TCP RST.

I had a similar problem a long time ago, and I had to add keepAliveTimeout and headersTimeout to the node configuration. Here are my settings (still using the LB default of the 60s):

server.keepAliveTimeout = 65000;
server.headersTimeout = 80000;

The metrics, especially memory and CPU usage of all instances are fine.

These 502 errors started after an update we made where we introduced several packages, for instance, axios. At first, I thought it could be axios, because the keep-alive is not enabled by default. But it didn't work. Other than the axios, we just use the request.

Any tips on how should I debug/fix this issue?

soltex
  • 2,993
  • 1
  • 18
  • 29
  • How do you know the 502 occurs before the request reaches the server? Some insight into how you're checking that could be illuminating. – Arlen Anderson Aug 11 '21 at 15:37
  • Through the logs that I have inside each instance, I have the log of the previous request that succeeded immediately before the 502, and the requests after the 502, but not the request that originated the 502. Fortunately, a coworker got the 502, and I could debug with his help, and I know exactly what was the request and what time. – soltex Aug 11 '21 at 15:44
  • I already increased the healthcheck timeout of that target group, so I could decrease the number of unhealthy instances (most probably related with some 502 errors), but without any luck, I am still having the same number unhealthy instances and 502 errors. – soltex Aug 11 '21 at 15:46
  • Could your provide some info about your setup? Are you confident that your instances are in AZs that your enabled for your LB? Also what LB is that? ALB? – Marcin Aug 14 '21 at 06:07
  • I think everything is fine with the LB, and yes, it is the ALB. I can provide all the details you need. – soltex Aug 16 '21 at 08:29

4 Answers4

5

HTTP 502 errors are usually caused by a problem with the load balancer. Which would explain why the requests are never reaching your server, presumably because the load balancer can't reach the server for some or other reason.

This link has some hints regarding how to get logs from a classic load balancer. However, since you didn't specify, you might be using an application load balancer, in which case this link might be more useful.

stijndepestel
  • 3,076
  • 2
  • 18
  • 22
  • I will enable the load balancer logs and see if there's some useful information – soltex Aug 13 '21 at 12:43
  • 1
    I found out that the web instances have a huge cpu usage peak right before they are terminated, but I don't know what is causing this. – soltex Aug 13 '21 at 14:40
  • From the LB access logs, I can say that connection might be getting terminated by the target. However, I don't know why this is happening, I already have the keep-alive configured. – soltex Aug 17 '21 at 15:40
  • I'm not sure we can help you without more information. Problem is, it's also difficult to determine which information we might need if you have no idea where the problem might be. Have you looked at all the error logs of your nginx etc? – stijndepestel Aug 17 '21 at 17:12
  • As I said, the problem might be in the target closing the connection, I will try to confirm this by using a packet capture. And I am not using nginx. Let me know if you need any other information. – soltex Aug 17 '21 at 17:33
  • @soltex have your problem solved? cause i am facing the same issue and I setup the keepAiveTimeout – HafizMuhammad Shoaib Mar 11 '22 at 15:20
  • @HafizMuhammadShoaib yes, I have solved the problem. Check my answer below: https://stackoverflow.com/a/68927075/2908330 – soltex Mar 11 '22 at 15:24
1

I was also Having the same problem from 1 or 2 months something like that and I didn't found the solution. And I was also having AWS Premium support but they were also not able to find the solution. I was getting 502 Error randomly loke may be 10 times per day. Finally after reading the docs from AWS

The target receives the request and starts to process it, but closes the connection to the load balancer too early. This usually occurs when the duration of the keep-alive timeout for the target is shorter than the idle timeout value of the load balancer.

https://aws.amazon.com/premiumsupport/knowledge-center/elb-alb-troubleshoot-502-errors/

SOLUTION:

I was running "Apache" webserver in EC2 so Increased "KEEPALIVETIMEOUT=65". This did the trick. For me.

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
Srinivas
  • 11
  • 1
0

From the ALB access logs I knew that either the ALB couldn't connect the target or the connection was being immediately terminated by the target.

And the most difficult part was figure out how to replicate the 502 error.

It looks like the node version I was using has a request header size limit of 8kb. If any request exceeded that limit, the target would reject the connection, and the ALB would return a 502 error.

Solution:

I solved the issue by adding --max-http-header-size=size to the node start command line, where size is a value greater than 8kb.

soltex
  • 2,993
  • 1
  • 18
  • 29
-1

A few common reasons for an AWS Load Balancer 502 Bad Gateway:

  1. Be sure to have your public subnets (that your ALB is targeting) are set to auto-assign a public IP (so that instances deployed are auto-assigned a public IP).
  2. Security group for your alb allows http and/or https traffic from the IPs that you are connecting from.
snowtimber
  • 168
  • 1
  • 9
  • This didn't completely solve my issue, but it had a tremendous impact. – arthurakay Oct 20 '22 at 15:17
  • 1
    How are these reasons for 502 please clarify? Be careful with the suggestion of assigning public IP addresses to your instances - typically they are hidden behind an ALB and are private for a reason. You will be exposing your instances to the internet. – advance512 Mar 08 '23 at 11:30
  • This doesn't answer the question why ? Instead, it suggests changes that will affect the system architecture as well as the system security. – 4hbane May 25 '23 at 16:18