Python urlretrieve downloading corrupted images

Question

I am downloading a list of images (all .jpg) from the web using this python script:

__author__ = 'alessio'

import urllib.request

fname = "inputs/skyscraper_light.txt"

with open(fname) as f:
    content = f.readlines()


for link in content:
    try:
        link_fname = link.split('/')[-1]
        urllib.request.urlretrieve(link, "outputs_new/" + link_fname)
        print("saved without errors " + link_fname)
    except:
        pass

In OSX preview I see the images just fine, but I can't open them with any image editor (for example Photoshop says "Could not complete your request because Photoshop does not recognize this type of file."), and when i try to attach them to a word document, the files are not even showed as picture files in the dialog for browsing for image.

What am i doing wrong?

It could be a problem with file extensions. Maybe OSX preview is smarter than Photoshop/Word and recognizes files that have the wrong extension? Can you give an example url and filename? — André Laszlo, Feb 08 '15 at 17:53
unrelated: do not use `except: pass`, use `except Exception as e: logging.error('failed to download %s: %s', link, e)` instead (add `import logging` at the top) — jfs, Feb 08 '15 at 17:55
does it help if you remove the trailing newline from the filenames? `for link in open(fname): link_fname = os.path.join("outputs_new", url2filename(link).strip())` where [`url2filename()`](http://stackoverflow.com/a/16501351/4279) — jfs, Feb 08 '15 at 18:28
@AndréLaszlo the images url were fine, I could save them from the browser — alessiop86, Feb 09 '15 at 06:48
@J.F.Sebastian Thanks, this was the problem! I have just added .split() to link_fname and now it works just fine :) — alessiop86, Feb 09 '15 at 06:49
@alessiop86: [post your own answer](http://stackoverflow.com/help/self-answer) (and accept it) so that others who want to download multiple files listed in a file would not forget to remove the newline. — jfs, Feb 09 '15 at 13:58

score 0 · Accepted Answer · answered Feb 13 '15 at 06:39

0

As J.F. Sebastian suggested me in the comments, the issue was related to the newline in the filename.

To make my script work, you need to replace

link_fname = link.split('/')[-1]

with

link_fname = link.strip().split('/')[-1]

answered Feb 13 '15 at 06:39

alessiop86

1,285
2
19
38

Python urlretrieve downloading corrupted images

1 Answers1