Can't download image from a website with Selenium; it gives 403 error

Question

I was trying to scrape pictures with Selenium of a certain character on Pixiv, but when I tried to download, it gave me a 403 error. I tried using the request module with the src link to download it directly and it gave me the error. I tried opening a new tab with the src link and it gave me the same error. Is there a way to download a image from Pixiv? I was planning something a bit larger than just downloading a single image, but I am stuck in it. I did put the user-agent, as this thread suggested, but or it didn't work or I did something wrong.

This is the image I tried to download: https://www.pixiv.net/en/artworks/93284987.

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://i.pximg.net/img-original/img/2021/10/07/19/41/28/93284987_p0.jpg')

I skipped the code to get it here, but what happens is that I get the src link to the image, but I can't access the page to download it. I don't know if I need to actually go to the page, but I can't do anything with the src either. I tried several methods but nothing works. Can someone help me?

you'll want to click the download link. If the site doesn't provide one, you can send the request in your native code. If that doesn't work, you can screenshot the webelement. The 403 indicates that you can't link directly to the image like that. (It may need the request to include other things to complete...) Even if you could, Selenium would not have a DOM to interact with. — pcalkins, Oct 08 '21 at 21:53
yeah, I don't think it has a download link. I tried clicking in the link of the a tag, but it just expanded the image and showed a new div with the full image and a src. What did you mean when you said to send a request in my native code? — bringand1, Oct 08 '21 at 22:42
I don't know python, but in Java you'd use FileUtils.copyURLToFile. You may need to send more than just the URL, though... the 403 error suggests you need more in your HTTP request than just the URL... (some kind of authentication maybe...) I would check with the site to see if they have an API available for this sort of thing. Else you might need to take a close look at the request when it's made in your browser. (or wireshark or something like that...) — pcalkins, Oct 08 '21 at 22:49
yep, I probably need send more. A simple request gave me a 403 error. Do you know what I should put? When you say authentication, you mean a kind of login? If so, I already am logged in the script, it is one of the first things I do, because I can't even follow a path to the image without logging. — bringand1, Oct 08 '21 at 22:55
yep, you might need a session token or something like that... or possibly just a referrer... (some site's check referrer to see that the link is from the site.) — pcalkins, Oct 08 '21 at 23:41
I got it. I just needed to add the referer to the header, as pcalkins said. Thank you! — bringand1, Oct 10 '21 at 18:13

score 0 · Answer 1 · answered Oct 09 '21 at 16:56

0

Seems to download just fine without any headers for me.

import requests
import shutil

url = 'https://i.pximg.net/img-master/img/2021/10/07/19/41/28/93284987_p0_master1200.jpg'
response = requests.get(url, stream=True)
local_filename = url.split('/')[-1]
with open(local_filename, 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response
print(local_filename)

answered Oct 09 '21 at 16:56

Alan

71
3

why have you used shutil module? It can't be done with the standard library? – bringand1 Oct 10 '21 at 18:13

Can't download image from a website with Selenium; it gives 403 error

1 Answers1