AWS Network load balancer - What is client reset count (and why is it high)

Question

The documentation for the various client/target/elb reset count metrics (TCP_Client_Reset_Count, TCP_Target_Reset_Count, TCP_ELB_Reset_Count) just says they count RST packets. I tried to understand what a RST packet is, and it seems to have to do with broken TCP connections. My load balancer has a single, long-term, seemingly successful client connection. Why do I see on the order of 100 client resets per hour? I also see about 10 load balancer resets per hour, and 0 target resets.

EDIT: I just observed that increasing the size of the server instance (I'm using Farscape--increased 0.25 vCPU to 0.5) led to a 10-fold reduction in client resets per hour. The number of load balancer resets did not change.

score 5 · Answer 1 · answered Mar 29 '18 at 11:49

My hunch is that this is related to a bug in the Network Load Balancer that causes it to send 100x as many health checks as it should. See: NLB Target Group health checks are out of control My theory is that a bug causes the health check connection to be broken in an unclean way if the target instance is not quick enough. These broken health check connections get reported as "client resets" even though they should be reported as "ELB resets" or not reported at all.

score 4 · Answer 2 · answered Mar 28 '18 at 19:02

There are many reasons for an TCP RST to be sent. Some are not normal, meaning errors, and some are normal connection cleanups that the TCP/IP stack or application performs.

An example of a normal TCP RST would be a long lived connection that exceeds some time limit imposed by one side or the other. Once the time limit is exceeded the connection can be "forceably" closed which will generate the RST.

An example of a not normal TCP RST would be an application that abruptly disconnected due to an internal error.

A poorly written application can also cause TCP RST when it does not perform graceful shutdowns on the TCP socket before closing the connection.

I will guess that the behavior you are seeing is not a problem. However, to really know, you will need to do a wire trace and protocol analysis on each connection to determine exactly what is happening.

Like I said, there is one client connection, and it is stable. The new connection metric (NewFlowCount) stays at 0. Can a RST occur without breaking the connection? `TCP_Client_Reset_Count` is "The total number of reset (RST) packets sent from a client to a target." Can this be spam traffic? But I would guess that spam connections that arrive at dead ports on the ELB would go towards `TCP_ELB_Reset_Count`, "The total number of reset (RST) packets generated by the load balancer." — Aleksandr Dubinsky, Mar 28 '18 at 21:32

Sree Lasya Vallabhaneni · Answer 3 · 2019-09-10T16:50:46.040

0

One of the reasons for load balancer reset counts might be higher is because of the network load balancer have an ideal time out value which is 350 seconds. So if your TCP connection does not get any acknowledgment back until the time out load balancer will forcefully close the connection.

edited Sep 10 '19 at 16:50

answered Sep 09 '19 at 19:06

Sree Lasya Vallabhaneni

1
3

1

is there a way to change this number? – Franklin Rivero Aug 12 '20 at 15:39
@FranklinRivero not according to https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html#connection-idle-timeout – pyb Oct 02 '20 at 15:35

AWS Network load balancer - What is client reset count (and why is it high)

3 Answers3