I'm to download about 10 million images and after a small experiment of downloading the first 1000 I noticed each takes ~4.5 seconds (which maybe could be slightly sped-up with multiprocessing.Pool
s) but the biggest problem is that the average image size is ~2400x2400 at ~2.2MB. I can resize them as soon as they are downloaded, but the main bottleneck (currently) is internet bandwidth. Is there a way to download the images directly at a lower resolution?
Sample dummy code:
import requests
resp = requests.get("some_url.jpg")
with open(fn, 'wb') as f:
f.write(resp.content)