0

I am downloading few images from a website that has its images url in sequence format.

Every time I run the loop it gets stuck at any random point for example after 50 images or sometimes after downloading 70 images and it doesn't throw any type of error and just get stuck. How can I handle this to make sure I download everything in 1 go without any stops ?

import urllib.request
from urllib.error import HTTPError
from urllib.error import URLError
import time

for i in range(1,240):
    filepath = r"C:\Users\....\Image collector\\"
    filename = f'{i:03}'+".jpg"
    fullpath = filepath+filename
    print(fullpath)

    url = "https://www.somerandomxyzwebsite.com/abc_"+f'{i:03}'+".jpg"

    try:
        urllib.request.urlretrieve(url, fullpath)          

    except HTTPError as e:
        print(e)

    except URLError:
        print("Server down or incorrect domain")

print('done')

Can I include a condition (if execution time > 1 min) then hit the same url again. Is it a right approach or is there other approach to handle this situation ?

Vivi
  • 155
  • 14
  • 1
    I would set a timeout value, and a retry value. You probably should take a look at this: https://stackoverflow.com/questions/23267409/how-to-implement-retry-mechanism-into-python-requests-library?rq=1 . Hope it helps – Flo Aug 07 '19 at 15:31
  • They may be noticing your rapid connections to their webserver as malicious traffic, or even just as web scraping, and [tarpitting](https://en.wikipedia.org/wiki/Tarpit_(networking)) your connections. You may want to build some `time.sleep` or other delays in your script so that you don't look like an attacker and/or use up all their bandwidth – G. Anderson Aug 07 '19 at 15:31
  • @Flo thanks this is helpful – Vivi Aug 07 '19 at 15:48
  • @G.Anderson I don't think this website is capable of tracking. Its a very small level website. I think the problem is that the server is not capable enough to handle so many requests continuously. So how should I re-request from the point where it has got stuck. I could be wrong here as I am an amateur in coding/tech space – Vivi Aug 07 '19 at 15:48
  • You may be correct, and @Flo gave you the link that could help in that case. But in general, smaller websites are hosted by companies that often have DDOS protection which may have the effect I described. And if not, then it's still a good idea to include some time between requests so that you don't overwhelm the servers for a small site (even if it's just for the benefit of normal users of the site, or being polite to the site's owner(s)) – G. Anderson Aug 07 '19 at 15:53
  • 1
    @G.Anderson yes this could be the case as well, thank you for educating me on this. And i knew about time delay but i wanted to understand how could I re-request or over come this without time delay and this is where link shared by Flo will help. – Vivi Aug 07 '19 at 16:06

0 Answers0