4

I have a flask app running some computations and I make requests to from a Jupyter notebook. The client side code follows the basic format:

outputs = []
for batch in request_batches:
    response = requests.post(flask_address, json=json.dumps(batch), timeout=3600)
    outputs.append(response)

The idea is to iterate through a series of request batches (batching makes sense for the application) and collect the responses.

Normally what happens is for each batch I see the request logged on the Flask app side, along with confirmation of the Post once the computation is complete.

00.000.000.000 - - [29/Apr/2020 02:21:46] "POST //docking HTTP/1.1" 200 -

After one batch finishes, the loop continues and the next request is sent.

The issue I'm having is sometimes the computation on the Flask size takes a little longer than normal, and this causes the request loop in the notebook to hang (Note that longer than normal is by a few minutes, well under the set timeout period). On the Flask size, the computation finishes successfully and there's a Post confirmation.

On the notebook size, the loop hangs and no further requests are made. This isn't a timeout issue - there are no timeout errors. The notebook cell just hangs until I manually interrupt it.

When I interrupt, I see the following stack trace:

~/opt/anaconda3/envs/env/lib/python3.7/socket.py in readinto(self, b)
    587         while True:
    588             try:
--> 589                 return self._sock.recv_into(b)
    590             except timeout:
    591                 self._timeout_occurred = True

After I interrupt, I can confirm that the response that causes the loop to hang was not added to the outputs. So somehow the Flask app is Posting a response, but the response isn't being received by the client notebook. Again 95% of the time this runs fine, but the 5% where the request takes longer to process results in the request loop freezing.

Does anyone know how to go about debugging this?

Karl
  • 961
  • 6
  • 10

1 Answers1

1

It's most likely due to a socket disconnect. See the issue here: https://github.com/psf/requests/issues/3353

You can set the TCP keepalive timers to combat this.

# Set TCP keep alive options to avoid HTTP requests hanging issue
# Reference: https://stackoverflow.com/a/14855726/2360527
import platform
import socket
import urllib3.connection

platform_name = platform.system()
orig_connect = urllib3.connection.HTTPConnection.connect
def patch_connect(self):
    orig_connect(self)
    if platform_name == "Linux" or platform_name == "Windows":
        self.sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
        self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 1),
        self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 3),
        self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5),
    elif platform_name == "Darwin":
        TCP_KEEPALIVE = 0x10
        self.sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
        self.sock.setsockopt(socket.IPPROTO_TCP, TCP_KEEPALIVE, 3)
urllib3.connection.HTTPConnection.connect = patch_connect
Zach Johnson
  • 2,047
  • 6
  • 24
  • 40