5

I am using Amazon ECS on AWS Fargate, My instances can access the internet, but the connection drops after 350 seconds. On average, out of 100 times, my service is getting ConnectionResetError: [Errno 104] Connection reset by peer error approximately 5 times. I found a couple of suggestions to fix that issue on my server-side code, see here and here

Cause

If a connection that's using a NAT gateway is idle for 350 seconds or more, the connection times out.

When a connection times out, a NAT gateway returns an RST packet to any resources behind the NAT gateway that attempt to continue the connection (it does not send a FIN packet).

Solution

To prevent the connection from being dropped, you can initiate more traffic over the connection. Alternatively, you can enable TCP keepalive on the instance with a value less than 350 seconds.

Existing Code:

url = "url to call http"
params = {
   "year": year,
   "month": month
}
response = self.session.get(url, params=params)

To fix that I am currently using a band-aid retry logic solution using tenacity,

@retry(
        retry=(
            retry_if_not_exception_type(
                HTTPError
            )  # specific: requests.exceptions.ConnectionError
        ),
        reraise=True,
        wait=wait_fixed(2),
        stop=stop_after_attempt(5),
)
def call_to_api():
    url = "url to call HTTP"
    params = {
       "year": year,
       "month": month
    }
    response = self.session.get(url, params=params)

So my basic question is how can I use python requests correctly to do any of the below solutions,

  • Close the connection before 350 seconds of inactivity

  • Enable Keep-Alive for TCP connections

ankon
  • 4,128
  • 2
  • 26
  • 26
A l w a y s S u n n y
  • 36,497
  • 8
  • 60
  • 103

2 Answers2

1

Concerning the "Close the connection before 350 seconds of inactivity" problem, there seems to be a read timeout parameter you can pass to the session.get() function call.

According to the doc "it’s the number of seconds that the client will wait between bytes sent from the server". Which, to me, looks like an inactivity timeout.

Giorgio Ruffa
  • 456
  • 3
  • 7
  • Thanks for the answer, I was thinking that the connection is automatically closing by request session, see here: https://stackoverflow.com/a/68124995/1138192, correct me if I am missing something here. – A l w a y s S u n n y Aug 11 '22 at 12:04
  • 1
    It seems that the connection is closed if you use the `Session` object as a context manager. That's for sure a good idea and probably best practice too. But I don't think it will solve your problem. For the connection to be closed, the `get` method needs to return, so that the context manager `__exit__` method is called. In your case, it seems that the `get` is "stuck" while waiting for a response until the NAT gateway decides to timeout the connection. – Giorgio Ruffa Aug 11 '22 at 12:10
1

Posting solution for the future user who will face this issue while working on AWS Farget + NAT,

We need to set the TCP keepalive settings to the values dictated by our server-side configuration, this PR helps me a lot to fix my issue: https://github.com/customerio/customerio-python/pull/70/files

import socket
from urllib3.connection import HTTPConnection


HTTPConnection.default_socket_options = ( HTTPConnection.default_socket_options + [
        (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
        (socket.SOL_TCP, socket.TCP_KEEPIDLE, 300),
        (socket.SOL_TCP, socket.TCP_KEEPINTVL, 60)
        ]
)
A l w a y s S u n n y
  • 36,497
  • 8
  • 60
  • 103