I have spent some time to find the best and fastest way of getting status codes of a huge URL list but no progress.
Here is my code:
import multiprocessing
import time
def check(url):
"""Send request to url and get a HTTP status code"""
try:
response = requests.head(url)
except requests.exceptions.RequestException:
return "404"
return str(response.status_code)
def multiprocessing_func():
url_list = [
# A huge list of URLs
]
pool = multiprocessing.Pool()
start = time.time()
pool.map(check, url_list)
done = time.time()
print("time: {}".format(done - start))
My laptop is a little bit slow, however:
when url_list has 1 URL, it takes 6 seconds to be done,
with 8 items, it takes 10 seconds,
with 32 items, it takes 24 seconds,
with 128 items, it takes 77 seconds and so on...
Why this time is growing in multiprocessing?
I think it should take nearly 6 or 7 seconds to be done(nearly the same amount of one URL).
What did I do wrong?
How can I do this in the fastest way (suppose I have a list with 10000 URLs)?
any suggestion would be appreciated.
best regards.