What @MoetazBrayek says in their comment (but not answer) is correct: the website you're querying is blocking the request.
It's common for sites to block requests based on user-agent or referer: if you try to curl https://www.thesun.co.uk/wp-content/uploads/2020/09/67d4aff1-ddd0-4036-a111-3c87ddc0387e.jpg
you will get an HTTP error (403 Access Denied):
❯ curl -I https://www.thesun.co.uk/wp-content/uploads/2020/09/67d4aff1-ddd0-4036-a111-3c87ddc0387e.jpg
HTTP/2 403
Apparently The Sun wants a browser's user-agent, and specifically the string "mozilla" is enough to get through:
❯ curl -I -A mozilla https://www.thesun.co.uk/wp-content/uploads/2020/09/67d4aff1-ddd0-4036-a111-3c87ddc0387e.jpg
HTTP/2 200
You will have to either switch to the requests
package or replace your url string with a proper urllib.request.Request
object so you can customise more pieces of the request. And apparently urlretrieve
does not support Request objects so you will also have to use urlopen
:
req = urllib.request.Request(URL, headers={'User-Agent': 'mozilla'})
res = urllib.request.urlopen(req)
assert res.status == 200
with open(filename, 'wb') as out:
shutil.copyfileobj(res, out)