0

I am trying to download an image from Wikipedia and save it to a file locally (using Python 3.9.x). Following this link I tried:

import urllib.request

http = 'https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')

However, when I try to open this file (Mac OS) I get an error: The file “test.jpg” could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.

I did some more search and came across this article which suggests modifying the User-Agent. Following that I modified the above code as follows:

import urllib.request

opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0')]
urllib.request.install_opener(opener)

http = 'https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')

However, modifying the User-Agent did NOT help and I still get the same error while trying to open the file: The file “test.jpg” could not be opened. It may be damaged or use a file format that Preview doesn’t recognize.

Another piece of information: the downloaded file (that does not open) is 235 KB. But if I download the image manually (Right Click -> Save Image As...) it is 455 KB.

I was wondering what else am I missing? Thank you!

tikka
  • 493
  • 1
  • 4
  • 17

1 Answers1

1

The problem is, you're trying to download the web page with the .jpg format. This link you used is actually not a photo link, but a Web site contains a photograph. That's why the photo size is 455KB and the size of the file you're downloading is 235KB.

Instead of this :

http = 'https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')

Use this :

http = 'https://upload.wikimedia.org/wikipedia/commons/thumb/b/be/Abacus_4.jpg/800px-Abacus_4.jpg'
urllib.request.urlretrieve(http, 'test.jpg')

It is better to open any photo you want to use first with the "open image in new tab" option in your browser and then copy the url.

Mahdi Baha
  • 41
  • 4
  • Thank you very much! When I go to `https://en.wikipedia.org/wiki/Abacus` and then click on that image it took me to `https://en.wikipedia.org/wiki/Abacus#/media/File:Abacus_4.jpg` (which contains only that image and has .jpg extension). So naively I assumed that was the photo link! So the key is, as you have mentioned, "It is better to open any photo you want to use first with the open image in new tab option in your browser and then copy the url." Brilliant explanation!! – tikka Nov 18 '22 at 22:20