I am using Amazon ECS on AWS Fargate, My instances can access the internet, but the connection drops after 350 seconds. On average, out of 100 times, my service is getting ConnectionResetError: [Errno 104] Connection reset by peer error approximately 5 times. I found a couple of suggestions to fix that issue on my server-side code, see here and here
Cause
If a connection that's using a NAT gateway is idle for 350 seconds or more, the connection times out.
When a connection times out, a NAT gateway returns an RST packet to any resources behind the NAT gateway that attempt to continue the connection (it does not send a FIN packet).
Solution
To prevent the connection from being dropped, you can initiate more traffic over the connection. Alternatively, you can enable TCP keepalive on the instance with a value less than 350 seconds.
Existing Code:
url = "url to call http"
params = {
"year": year,
"month": month
}
response = self.session.get(url, params=params)
To fix that I am currently using a band-aid retry logic solution using tenacity,
@retry(
retry=(
retry_if_not_exception_type(
HTTPError
) # specific: requests.exceptions.ConnectionError
),
reraise=True,
wait=wait_fixed(2),
stop=stop_after_attempt(5),
)
def call_to_api():
url = "url to call HTTP"
params = {
"year": year,
"month": month
}
response = self.session.get(url, params=params)
So my basic question is how can I use python requests correctly to do any of the below solutions,
Close the connection before 350 seconds of inactivity
Enable Keep-Alive for TCP connections