1

I am downloading a list of images (all .jpg) from the web using this python script:

__author__ = 'alessio'

import urllib.request

fname = "inputs/skyscraper_light.txt"

with open(fname) as f:
    content = f.readlines()


for link in content:
    try:
        link_fname = link.split('/')[-1]
        urllib.request.urlretrieve(link, "outputs_new/" + link_fname)
        print("saved without errors " + link_fname)
    except:
        pass

In OSX preview I see the images just fine, but I can't open them with any image editor (for example Photoshop says "Could not complete your request because Photoshop does not recognize this type of file."), and when i try to attach them to a word document, the files are not even showed as picture files in the dialog for browsing for image.

What am i doing wrong?

alessiop86
  • 1,285
  • 2
  • 19
  • 38
  • Are you using Python 2 or 3? – Malik Brahimi Feb 08 '15 at 17:47
  • @MalikBrahimi `urllib.request` is Python 3. – André Laszlo Feb 08 '15 at 17:51
  • It could be a problem with file extensions. Maybe OSX preview is smarter than Photoshop/Word and recognizes files that have the wrong extension? Can you give an example url and filename? – André Laszlo Feb 08 '15 at 17:53
  • 1
    unrelated: do not use `except: pass`, use `except Exception as e: logging.error('failed to download %s: %s', link, e)` instead (add `import logging` at the top) – jfs Feb 08 '15 at 17:55
  • 1
    does it help if you remove the trailing newline from the filenames? `for link in open(fname): link_fname = os.path.join("outputs_new", url2filename(link).strip())` where [`url2filename()`](http://stackoverflow.com/a/16501351/4279) – jfs Feb 08 '15 at 18:28
  • @AndréLaszlo the images url were fine, I could save them from the browser – alessiop86 Feb 09 '15 at 06:48
  • @J.F.Sebastian Thanks, this was the problem! I have just added .split() to link_fname and now it works just fine :) – alessiop86 Feb 09 '15 at 06:49
  • @alessiop86: [post your own answer](http://stackoverflow.com/help/self-answer) (and accept it) so that others who want to download multiple files listed in a file would not forget to remove the newline. – jfs Feb 09 '15 at 13:58

1 Answers1

0

As J.F. Sebastian suggested me in the comments, the issue was related to the newline in the filename.

To make my script work, you need to replace

link_fname = link.split('/')[-1]

with

link_fname = link.strip().split('/')[-1]
alessiop86
  • 1,285
  • 2
  • 19
  • 38