Iterate and make requests over a list of URLs asynchronously

Question

I made a script the other night that made requests to a website 100000 times with a different URL number (ranging from 0000 - 10000). I was using the requests library, but it was extremely slow. Here's my current script:

import requests

for num in range(9999):
    num = '{0:04}'.format(num)
    print(num)
    URL = "www.site.com/" + str(num)
    r = requests.get(url = URL)
    print(r.content)

I've heard aiohttp allows for asynchronous requests, but I'm not sure the simplest way to do things given what I'm trying to achieve. Any ideas?

The concurrent threads solution from me with configurable pool size: https://stackoverflow.com/questions/65365783/how-do-connections-recycle-in-a-multiprocess-pool-serving-requests-from-a-single/65466690#65466690 if it's not enough for your case - let me know here. — gore, Dec 28 '20 at 05:01
Thanks for this, although I'm not entirely sure how to adapt it for my case? — flowermia, Dec 28 '20 at 05:07
You could try running a bash script. Multiprocessing is much easier to implement. — Shrey Joshi, Dec 28 '20 at 05:26
One good article for your case is using `grequests` build on top of `requests` and `gevent` https://stackoverflow.com/a/38280387/5973377 — Nishant Patel, Dec 28 '20 at 06:44

gore · Answer 1 · 2020-12-28T14:11:05.423

The concurrent threads solution from me with configurable pool size: How do connections recycle in a multiprocess pool serving requests from a single requests.Session object in python?

Thanks for this, although I'm not entirely sure how to adapt it for my case?

from concurrent.futures.thread import ThreadPoolExecutor
from functools import partial

from requests import Session, Response
from requests.adapters import HTTPAdapter


list_of_urls = [("www.site.com/" + "{0:04}".format(num)) for num in range(9999)] # one row difference with the solution from link above


def thread_pool_execute(iterables, method, pool_size=30) -> list:
    """Multiprocess requests, returns list of responses."""
    session = Session()
    session.mount('https://', HTTPAdapter(pool_maxsize=pool_size))
    session.mount('http://', HTTPAdapter(pool_maxsize=pool_size))
    worker = partial(method, session)
    with ThreadPoolExecutor(pool_size) as pool:
        results = pool.map(worker, iterables)
    session.close()
    return list(results)

def simple_request(session, url) -> Response:
    return session.get(url)

response_list = thread_pool_execute(list_of_urls, simple_request)

Thank you! I just tried running it and got "NameError: name 'partial' is not defined" - is this a method I need to define? — flowermia, Dec 28 '20 at 05:56

Iterate and make requests over a list of URLs asynchronously

1 Answers1