I'm trying to download many images from a list of URL. By many, I mean in the vicinity of 10 000. The images vary in size, from a few hundreds of KB to 15 MB. I wonder what would be the best strategy to go about this task, trying to minimize the total time to finish, and to avoid freezing.
I use this function to save each image :
def save_image(name, base_dir, data):
with open(base_dir + name, "wb+") as destination:
for chunk in data:
destination.write(chunk)
I take the file extension from the URL with this function :
def get_ext(url):
"""Return the filename extension from url, or ''."""
""" From : https://stackoverflow.com/questions/28288987/identify-the-file-extension-of-a-url """
parsed = urlparse(url)
root, ext = splitext(parsed.path)
return ext # or ext[1:] if you don't want the leading '.'
And to get the images I just do :
for image in listofimages:
r = requests.get(image["url"], timeout=5)
extension = get_ext(image["url"])
name = str( int(image['ID']) ) + "AA" + extension
save_image( name, "images/", r )
Now putting it all together is quite slow. Hence my question.