7

We are running the following code to upload to GCP Buckets in parallel. It seems we are quickly using up all the connections in the pool based on the warnings we are seeing. Is there any way to configure the connection pool the library is using?

def upload_string_to_bucket(content: str):
        blob = bucket.blob(cloud_path)
        blob.upload_from_string(content)

with concurrent.futures.ThreadPoolExecutor() as executor:
            executor.map(upload_string_to_bucket, content_list)
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
WARNING:urllib3.connectionpool:Connection pool is full, discarding connection: www.googleapis.com
Michael
  • 679
  • 7
  • 24

1 Answers1

1

I have a similar issue with download blobs in parallel.

This article may be informative. https://laike9m.com/blog/requests-secret-pool_connections-and-pool_maxsize,89/

Personally, I don't think that increasing a connection pull is the best solution, I prefer to chunk the "downloads" by pool_maxsize.

def chunker(it: Iterable, chunk_size: int):
    chunk = []
    for index, item in enumerate(it):
        chunk.append(item)
        if not (index + 1) % chunk_size:
            yield chunk
            chunk = []
    if chunk:
        yield chunk

for chunk in chunker(content_list, 10):
    with concurrent.futures.ThreadPoolExecutor() as executor:
        executor.map(upload_string_to_bucket, chunk)

Of course, we can spawn the download right away after one is ready, all as we wish.

worroc
  • 117
  • 8
  • 1
    The problem is we are using the GCP library and they do not externalize the ability to configure the pool. I do like your approach of chunking but the default pool currently is definitely too small – Michael Aug 08 '19 at 11:16