I'm building a script that scrapes a website for some information. I expect to be making around 10k GET requests, and I'm speeding it up using multithreading. Something like this:
import requests
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=N) as pool:
pool.map(requests.get, urls)
While I want my script to be fast, at the same time I want to be nice to the website's servers (and not be mistaken for a DOS attempt). While I found plenty of questions about how to multithread requests, I couldn't find any on a typical rate for making requests.
In this situation, what's a typical limit for the number of concurrent requests (max_workers
)?