60

Does seeing the

urllib3.connectionpool WARNING - Connection pool is full, discarding connection

mean that I am effectively loosing data (because of lost connection)
OR
Does it mean that connection is dropped (because pool is full); however, the same connection will be re-tried later on when connection pool becomes available?

JavaFan
  • 1,295
  • 3
  • 19
  • 28

2 Answers2

60

No data is being lost!

The connection is being discarded after the request is completed (because the pool is full, as mentioned). This means that this particular connection is not going to be re-used in the future.

Because a urllib3 PoolManager reuses connections, it will limit how many connections are retained per host to avoid accumulating too many unused sockets. The PoolManager can be configured to avoid creating excess sockets when the pool doesn't have any idle sockets available with PoolManager(..., block=True).

If you're relying on concurrency, it could be a good idea to increase the size of the pool (maxsize) to be at least as large as the number of threads you're using, so that each thread effectively gets its own connection.

More details here: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#customizing-pool-behavior

Gulzar
  • 23,452
  • 27
  • 113
  • 201
shazow
  • 17,147
  • 1
  • 34
  • 35
  • 2
    That's a very wrong answer and interpretation, judging by the very documentation you mentioned. There's no "retrying later", all connections are opened immediately regardless of pool size. Also, _increasing_ the number of threads without changing `maxsize` (or `pool_size` if different hosts) will not make the warnings go away, it will increase them! – MestreLion Mar 17 '21 at 10:24
  • 1
    @MestreLion Rereading it now, I think you're right. My answer was very confusing. I meant that the first part was the correct interpretation ("connection is being dropped"), but the second part that it's being reused was indeed incorrect. I also meant that they should increase the pool size, not the number of threads. I clarified the answer, thanks for pointing it out. – shazow Mar 18 '21 at 14:04
  • As @MestreLion mentions, increasing the `maxsize` made the warnings appear more frequently than before. Then I set `maxsize=1` and they went away... Although the concurrent requests speed slowed down overall. Not sure how to find the right balance between no warnings and fast requests lol. – David May 23 '21 at 15:51
  • 2
    @dvdblk: There's no "balance" between warnings and performance: to get no warnings just make your `maxsize` equal to the number of worker threads you're using. That way all connections will be kept in the pool for reuse, hence no warnings. And to improve performance, just increase your worker threads. I've read that around 4-5 threads per CPU core is optimal for internet (i.e. slow) I/O. – MestreLion May 26 '21 at 07:03
  • This is assuming all your connections are to a single host (so you're using a single ConnectionPool) – MestreLion May 26 '21 at 07:05
  • 1
    @shazow: the update was a great improvement! But a statement like _"it will limit how many connections are allowed per host"_ is still inaccurate: `urllib3` will **always** open as many connections as you request, even if it discards some after usage. – MestreLion May 26 '21 at 07:10
  • I'm using aiohttp and creating new asyncio tasks dynamically that use this session. So I'm not sure if I can set a precise number for it. – David May 26 '21 at 10:20
  • 1
    @dvdblk: Setting worker threads in `aiohttp` is completely out of the scope for this question, but you surely can do it, just check its [documentation on connectors](https://docs.aiohttp.org/en/stable/client_advanced.html#connectors) – MestreLion May 27 '21 at 05:39
  • @MestreLion Another good point. I clarified that sentence and added a note about block=True, also made the answer into a community wiki so you're welcome to edit it further. :) – shazow May 27 '21 at 17:09
15

According to the documentation on Customizing Pool Behavior, neither of your interpretations are correct:

By default, if a new request is made and there is no free connection in the pool then a new connection will be created. However, this connection will not be saved if more than maxsize connections exist. This means that maxsize does not determine the maximum number of connections that can be open to a particular host, just the maximum number of connections to keep in the pool.

(my emphasis)

So connections were not aborted to be retried later. They were made immediately, as requested, and results returned. Then, after they have completed, those "extra" connections were discarded, i.e., they were not kept in the pool for later reuse.

For example, if your maxsize is 10 (the default when using urllib3 via requests), and you launch 50 requests in parallel, those 50 connections will be performed at once, and after completion only 10 will remain in the pool while 40 will be discarded (and issue that warning).

MestreLion
  • 12,698
  • 8
  • 66
  • 57