3

I'm trying to download an image using this code:

from urllib import urlretrieve
urlretrieve('http://gdimitriou.eu/wp-content/uploads/2008/04/google-image-search.jpg', 
            'google-image-search.jpg')

It worked. The image was downloaded and can be open by any image viewer software.


However, the code below is not working. Downloaded image is only 2KB and can't be opened by any image viewer.

from urllib import urlretrieve
urlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 
            'Zindagi1976.jpg')

Here is the result in HTML format.

    ERROR

The requested URL could not be retrieved

While trying to retrieve the URL: http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg

The following error was encountered:

Access Denied.
Access control configuration prevents your request from being allowed at this time. Please contact your service provider if you feel this is incorrect.

Your cache administrator is nobody. 
Generated Mon, 05 Dec 2011 17:19:53 GMT by sq56.wikimedia.org (squid/2.7.STABLE9)
user
  • 5,370
  • 8
  • 47
  • 75
  • 2
    2KB is usually plain text or html. Try changing ```'Zindagi1976.jpg'``` to ```'Zindagi1976.html'``` and open it in your browser. The information might help debugging. (I suspect a header issue.) Please post it here. – Brigand Dec 05 '11 at 17:18
  • 1
    It looks like Wikimedia is checking your request. When you navigate to the image in the browser, it sends Wikimedia.org information about your set-up (e.g., your [user-agent](http://en.wikipedia.org/wiki/User_agent)). Based on what-ever Python sends, it's denying access. I don't know how to fix this using urlretrieve. [curl](http://curl.haxx.se/) can probably do what you want, though it's not the nicest solution. – Brigand Dec 05 '11 at 17:28
  • 1
    Looks like your request was denied. I would not be surprised if the server is denying access to unknown web agents. – EmFi Dec 05 '11 at 17:29
  • There's no reason to use pastebin. Please post the relevant information directly in your question. – Wilduck Dec 05 '11 at 17:29
  • The thing with `urlretrieve` is that it will take whatever the server returns and save it as that `jpeg` file. This is problematic if the server is returning a `page not found` or other error, because you have to find out what it's being sent as. Delete the extension and open it in notepad to see what it's sending you. – jfa Jun 02 '14 at 19:09

1 Answers1

12

If you used the following, you can download the image:

wget http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg

But if you did the following:

from urllib import urlretrieve
urlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 
            'Zindagi1976.jpg')

You may not be able to download image. This may be the case because wikipedia may have rules (robot.txt) to deny robots or bots (unknown clients). Try emulating a browser.

To do that you have to add the following as a part of header:

('User-agent', 
 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) 
 Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')

You can do something like this:

>>> from urllib import FancyURLopener
>>> class MyOpener(FancyURLopener):
...     version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
... 
>>> myopener = MyOpener()
>>> myopener.retrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 'Zindagi1976.jpg')
('Zindagi1976.jpg', <httplib.HTTPMessage instance at 0x1007bfe18>)

This retrieves the file

pyfunc
  • 65,343
  • 15
  • 148
  • 136
  • I tired. `NameError: name 'FancyURLopener' is not defined` –  Dec 05 '11 at 17:45
  • @no_access : Thanks!. I just changed the question so that it is easy for the search. – pyfunc Dec 05 '11 at 17:51
  • I'm looking for a quick way to get an http response code from a url. If code is `200' then download the images. Can i get response code with `MyOpener`? thanks –  May 26 '12 at 09:13
  • @Organic: Use "Head" request. This is already answered in another SO question at http://stackoverflow.com/questions/107405/how-do-you-send-a-head-http-request-in-python – pyfunc May 29 '12 at 03:19