0

I am trying to save the picture from a link. I am reading the Url from the csv file one after another and trying to retrieve from the url and saving it in my local folder.

But after 6000 hit, python code is just hanging. It is not giving me any exception. I don't know what to do.

This is my code snippet:

img_path = eachrow[11]
    try:
        print("Image url ::" +baseURL+img_path)
        opener = urllib.request.build_opener()
        opener.addheaders = [('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
        urllib.request.install_opener(opener)
        urllib.request.urlretrieve(baseURL+img_path, image_file_path + img_path)

    except Exception as e:
        print(e)
        img_path = "No image"
Dhrubo Saha
  • 11
  • 1
  • 4
  • How long have you let it hang? Do you control the server(s) you are hitting with the script? If you rerun the script immediately, does it process again without a problem until after 6000 hits? Does it always stop at the same spot or is it random? – Anthony L Mar 02 '18 at 23:43
  • Hello,I kept the script for the whole night. I don't have any control over the server. 6000 is not exact number. But it starts after 5000 usually. And if I start the script again, it start download the picture again. The process is not quite continues. IT hangs after a while. Does that answer your questions? – Dhrubo Saha Mar 02 '18 at 23:44
  • You may want to consider refactoring your code to use requests and set a timeout, so you can skip hanging files: https://stackoverflow.com/questions/32763720/timeout-a-file-download-with-python-urllib – Anthony L Mar 02 '18 at 23:48
  • I just edited my comment with answering all your questions. Thanks – Dhrubo Saha Mar 02 '18 at 23:49
  • I am just wondering, probably this is the server issue, where server is not allowing the request to happen. So in that case, If I set the timeout then probably all other images will not save for rejecting from the server, right? – Dhrubo Saha Mar 02 '18 at 23:51
  • The timeout would be per request. – Anthony L Mar 02 '18 at 23:52

0 Answers0