Python, IOError: cannot identify image file

Question

I know that there is a lot of questions similar questions, but I think, that there is not an igual (I think). I have an array of image's urls and I want to download it. But when I try to save images I get this error. I don't know how to make it works.

This is my code:

listOfImagesUrl = ['https://cdn.psychologytoday.com/sites/default/files/blogs/1023/2012/09/105928-103553.jpg', 'http://i.livescience.com/images/i/000/048/264/original/disgusted-101130-02.jpg%3F1324346664', 'http://barfblog.com/wp-content/uploads/images/disgust.story.jpg', 'http://cache1.asset-cache.net/gc/148190074-people-making-disgusted-faces-gettyimages.jpg%3Fv%3D1%26c%3DIWSAsset%26k%3D2%26d%3Dww%252BvNwEe%252BXzLnQze1Z2w9KNDivKR%252BEqGJ2cPfDe1oeinIezLX%252B8y1tIG3LNjTbL5']

imageNumber = 1

for imageUrl in listOfImagesUrl:

    file = cStringIO.StringIO(urllib.urlopen(imageUrl).read())
    img = Image.open(file)
    img.save("/tmp/test/" + str(imageNumber) + "." + img.format)
    print "DONE: " + str(imageNumber) + " of " + str(len(listOfImagesUrl))
    imageNumber += 1

I solved the url problem using the sleeplessnerd's answer on this stackoverflow's question. The problem was that I had to enable cookies on urllib2.

The second URL gives you a 500 error when loaded with `urllib` and the response body is empty. Do check your responses before trying to read data. — Martijn Pieters, Aug 24 '15 at 08:07
The `%3F` sequence is an encoded question mark. It should not be part of the URL path, you have a query parameter (cache buster probably), encoded into the URL instead. The last URL has the same. — Martijn Pieters, Aug 24 '15 at 08:08
But, if I econde my second url to http://www.livescience.com/images/i/000/048/264/original/disgusted-101130-02.jpg it still not working. What is happening? Thanks for your help. — Carlos Porta, Aug 24 '15 at 08:24
The end of the [`urllib.urlopen` documentation](https://docs.python.org/2/library/urllib.html#urllib.urlopen) says "Deprecated since version 2.6: The `urlopen()` function has been removed in Python 3 in favor of [`urllib2.urlopen()`](https://docs.python.org/2/library/urllib2.html#urllib2.urlopen)." — martineau, Aug 24 '15 at 08:36
Thank you, I changed to urllib2 and I get a new error. urllib2.HTTPError: HTTP Error 301: The HTTP server returned a redirect error that would lead to an infinite loop. The last 30x error message was: Moved Permanently. I'll search a fix for this error now. — Carlos Porta, Aug 24 '15 at 08:42
You might want to try using a different url for testing. i.e. put the barfblog.com first in the list. — martineau, Aug 24 '15 at 08:46

score 1 · Accepted Answer · answered Aug 24 '15 at 09:03

I switch to urllib2 and restructured your code as shown to provide more error information. Seems that most of your image URLs are no good.

from urllib2 import urlopen, URLError
from cStringIO import StringIO
from PIL import Image

listOfImagesUrl = [
    'http://barfblog.com/wp-content/uploads/images/disgust.story.jpg',
    'https://cdn.psychologytoday.com/sites/default/files/blogs/1023/2012/09/105928-103553.jpg',
    'http://i.livescience.com/images/i/000/048/264/original/disgusted-101130-02.jpg%3F1324346664',
    'http://cache1.asset-cache.net/gc/148190074-people-making-disgusted-faces-gettyimages.jpg%3Fv%3D1%26c%3DIWSAsset%26k%3D2%26d%3Dww%252BvNwEe%252BXzLnQze1Z2w9KNDivKR%252BEqGJ2cPfDe1oeinIezLX%252B8y1tIG3LNjTbL5'
]

for imageNumber, imageUrl in enumerate(listOfImagesUrl, start=1):
    try:
        url = urlopen(imageUrl)
    except URLError as e:
        print "skipping {}".format(imageUrl)
        print "  error: {}".format(e)
        continue
    file = StringIO(url.read())
    img = Image.open(file)
    img.save("/tmp/test/" + str(imageNumber) + "." + img.format)
    print "DONE: " + str(imageNumber) + " of " + str(len(listOfImagesUrl))

Thank you martineau, your code is much better than mine. One thing that I did not understand is why I get those errors? If I open the urls on a browser, it shows me an image. What is happening? — Carlos Porta, Aug 25 '15 at 03:22
@Caaarlos: Thanks for accepting my answer even though I didn't know about needing to enable cookies for the urls to work. — martineau, Sep 29 '15 at 14:26

Python, IOError: cannot identify image file

1 Answers1