Recently, i am trying to download some image from a website. I search the displayed image element inside html. Then, I open the image url on new tab, but it returns 403 Forbidden page. I copy the string and insert it into another pages html and the image can display successfully. I want to ask about the reason of it, and what can i do to download the image. (I am trying to download it through python request.get()) Thank you.
-
That's quite strange. Since you're saying the image doesn't show up if you copy the URL in a new tab, it's not a `User-Agent` issue, and the image successfully loads when inserted in another html page, it's probably not a `Referer` issue. Can you post links to both the image and the page the image was on? – GordonAitchJay Mar 12 '20 at 16:08
-
https://tw.manhuagui.com/comic/35275/481200.html This is the link of a comic website, and the image is actually that comic page. – HA HA chan Mar 12 '20 at 17:27
-
Please provide a [mcve]. – AMC Mar 12 '20 at 19:00
2 Answers
Some websites block requests without a useragent, try this:
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
requests.get(url, headers=headers)
Reference to Python requests. 403 Forbidden

- 1,048
- 6
- 19
-
Thank you for your answer. I tried it already and it still return 403 Forbidden. Actually this problem also appears when i manually open the image url through my browser. – HA HA chan Mar 12 '20 at 17:25
This web server checks the Referer
header when you request the image. To successfully download the image, the Referer
must be the page the image is on. It doesn't care about the User-Agent
. I assume the image showed up when you put it in another page because your browser cached the image, and did not actually request it from the server again.
By using your browser's network monitor tool, you can see how your browser got the image's URL. In this case, the URL wasn't a part of the original html document. Your browser executed some JavaScript that unpacked the URL and inserted an img
element into the div
element with id="mangaBox"
. Because of this, you can't use vanilla requests
, as it doesn't execute JavaScript. I used Requests-HTML.
The code below downloads the image from the link you gave in your comment, and saves it to disk:
import os, urllib
from requests_html import HTMLSession
session = HTMLSession()
session.headers.update({"User-Agent": r"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0",
"Referer": r"https://tw.manhuagui.com/comic/35275/481200.html"
})
url = r"https://tw.manhuagui.com/comic/35275/481200.html"
response = session.get(url)
print(response, len(response.content))
response.html.render()
img = response.html.find("img#mangaFile", first=True)
print("img element:", img)
url = img.attrs["src"]
print("image url:", url)
response = session.get(url)
print(response, len(response.content))
filename = os.path.basename(urllib.parse.urlsplit(url).path)
print("filename:", filename)
with open(filename, "wb") as f:
f.write(response.content)
Output:
<Response [200]> 6715
img element: <Element 'img' alt='在地下城寻找邂逅难道有错吗? 第00话' id='mangaFile' src='https://i.hamreus.com/ps3/z/zdxcxzxhndyc_sddc/第00话/P0018.jpg.webp?cid=481200&md5=aAAP75PBy9DIa0bb8Hlwfw' class=('mangaFile',) data-tag='mangaFile' style='display: block; transform: rotate(0deg); transform-origin: 50% 50% 0px;' imgw='907'>
image url: https://i.hamreus.com/ps3/z/zdxcxzxhndyc_sddc/第00话/P0018.jpg.webp?cid=481200&md5=aAAP75PBy9DIa0bb8Hlwfw
<Response [200]> 186386
filename: P0018.jpg.webp
For what it's worth, a whole heap of image URLs, in addition to the main image of the current page, are packed in the last script
element of the original html document.
<script type="text/javascript">window["\x65\x76\x61\x6c"](function(p,a,c,k,e,d)...

- 4,640
- 1
- 14
- 16
-
It works! Thank you for your answer. I decided to use request with the 'Referer' because there is some error while I go through response.html.render(). Anyway, you figured out the problem and that's enough for me. – HA HA chan Mar 13 '20 at 13:32
-
Great! Don't forget to accept and/or vote up any helpful answers, as per [What should I do when someone answers my question?](https://stackoverflow.com/help/someone-answers). By the way, was the error something like `pyppeteer.errors.NetworkError` or `This event loop is already running`? – GordonAitchJay Mar 13 '20 at 13:40