2

I am debugging a Python flask application. The application runs atop uWSGI configured with 6 threads and 1 process. I am using Flask-Executor to offload some slower tasks. These tasks create a connection with the Flask application, i.e., the same process, and perform some HTTP GET requests. The executor is configured to use 2 threads max. This application runs on Ubuntu 16.04.3 LTS.

Every once in a while the threads in the executor completely stop working. The code uses the Python requests library to do the requests. The underlying error message is:

Action failed. HTTPSConnectionPool(host='somehost.com', port=443): Max retries exceeded with url: /api/get/value (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f8d75bb5860>: Failed to establish a new connection: [Errno 11] Resource temporarily unavailable',))

The code that is running within the executor looks like this:

adapter = requests.adapters.HTTPAdapter(max_retries=3)
session  = requests.Session()
session.mount('http://somehost.com:80', adapter)
session.headers.update({'Content-Type': 'application/json'})
...
session.get(uri, params=params, headers=headers, timeout=3)

I've spent a good amount of time trying to peel back the Python requests stack down to the C sockets that it uses. I've also tried reproducing this error using small C and Python programs. At first I thought it could be that sockets were not getting closed and so we were running out of allowable sockets as a resource, but that gives me a message more along the lines of "too many files are open".

Setting aside the Python stack, what could cause a [Errno 11] Resource temporarily unavailable on a socket connect() command? Also, if you've run into this using requests, are there arguments that I could pass in to prevent this?

I've seen the What can cause a “Resource temporarily unavailable” on sock send() command StackOverflow post, but I'm that's on a send() command and not on the initial connect(), which is what I suspect is where the code is getting hung up.

Tom Huibregtse
  • 573
  • 1
  • 5
  • 13

1 Answers1

3

The error message Resource temporarily unavailable corresponds to the error code EAGAIN.

The connect() manpage states, that the error `EAGAIN occurs in the following situations:

No more free local ports or insufficient entries in the routing cache. For AF_INET see the description of /proc/sys/net/ipv4/ip_local_port_range ip(7) for information on how to increase the number of local ports.

This can happen, when very many connections to the same IP/port combination are in use and no local port for automatic binding can be found. You can check with

netstat -tulpen

which connections exactly cause this.

Ctx
  • 18,090
  • 24
  • 36
  • 51
  • I understand that FreeBSD's connect() also does this, although `EAGAIN` has a different number (35) on that OS so the Linux assumption is a safe bet. (NetBSD and Mac OSX don't seem to do this if their man pages are to be believed.) – Ian Abbott Feb 26 '20 at 17:02
  • This is running on LInux. Sorry for not making that clear. – Tom Huibregtse Feb 26 '20 at 17:05
  • To be clear, there would need to be more than 28231 local connections open? `# cat /proc/sys/net/ipv4/ip_local_port_range` is `32768 60999`. This would cover the "No more free local ports"? Would "insufficient entries in the routing cache" be another possibility I should be checking? – Tom Huibregtse Feb 26 '20 at 17:14
  • @TomHuibregtse Current linux kernels usually do not use a routing cache any more, so I doubt that – Ctx Feb 26 '20 at 17:17
  • Could this have anything to do with the new system that replaced the routing cache? Or socket send/receive buffers? Perhaps the man page for `connect()` does not enumerate all failures that `EAGAIN` could cover. Also, could this have to do with what `/proc/sys/net/core/somaxconn` is set to, in my case 128? I've been monitoring `netstat -tulpen`. I have not seen that go beyond ~80 connections. – Tom Huibregtse Feb 28 '20 at 17:54
  • We saw the issue again this morning. `netstat -tulpen` was a dead end. Any other ideas? – Tom Huibregtse Mar 25 '20 at 15:48