2

When i try this line:

import urllib.request

urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")

i get the following error:

Traceback (most recent call last):
  File "scraper.py", line 26, in <module>
    urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
  File "/usr/lib/python3.6/urllib/request.py", line 248, in urlretrieve 
    with contextlib.closing(urlopen(url, data)) as fp:
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

But the link works fine in my browser? Why does it work in the browser but not for a request? It works with other pictures from the same site.

2 Answers2

2

The request returns

enter image description here

If you check your developer console, It's a 404: enter image description here

So what you see is imgur's custom 404 "page" (which is an image).

EDIT:

So urlretrieve fails on 404 status code. If you want to use the contents of the request (even if the statuscode is 404) you can do the following:

try:
    urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
except Exception as e:
    with open("error_photo.jpg", 'wb') as fp:
        fp.write(e.read())
fodma1
  • 3,485
  • 1
  • 29
  • 49
  • Why does the request return this? I can see the picture fine in my browser? – smart_beaver Apr 29 '20 at 15:39
  • The error image says it. The image was deleted (the URL is invalid / deleted). The status code is correct, but a request with the status code of 404 can carry a payload - which is this image in this specific case. – fodma1 Apr 29 '20 at 15:41
  • Try this one for example: https://stackoverflow.com/users/241921231234/nonexistentuser the status code is 404, but the your browser displays it – fodma1 Apr 29 '20 at 15:42
  • If it is possible to see the picture in the browser, it should also be possible to save it? – smart_beaver Apr 29 '20 at 15:47
  • If i try the link in my browser https://i.redd.it/53tfh959wnv41.jpg i can see a picture. That is not the 404 error message, why? – smart_beaver Apr 29 '20 at 16:06
  • The default 404 error message is displayed by the browser if there is no payload returned in the request body. Sites, however have the freedom to customize what they want to display in your browser in case the requested resource was not found. In this case it is an image saying "If you are looking for an image, it was probably deleted" – fodma1 Apr 29 '20 at 16:31
  • But i do get the intended image in my browser. I only get the "If you are looking for an image, it was probably deleted" image when i request it with python. – smart_beaver Apr 29 '20 at 16:48
  • Please double check the URL, it's the 404 error image for me even in my browser. It could be that it was deleted, but it got stuck in your computer's local cache, so try to open it in private mode – fodma1 Apr 29 '20 at 16:49
  • You are right, the link does not work for me in private mode. – smart_beaver Apr 29 '20 at 20:53
  • Answer to the question "why does it work in my browser" is because your browser cached it. – Mike Szyndel Feb 16 '23 at 16:24
0

Try to change user-agent. You can just add a kwarg:

req = urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg", headers={"User-Agent": "put custom user agent here"})
floordiv
  • 1
  • 1
  • I get this error: TypeError: urlretrieve() got an unexpected keyword argument 'headers' – smart_beaver Apr 29 '20 at 15:45
  • sorry, my fault. See this thread: [how to use urlretrieve with custom user-agent](https://stackoverflow.com/questions/45247983/urllib-urlretrieve-with-custom-header) – floordiv Apr 29 '20 at 15:54