0

I was trying to scrape pictures with Selenium of a certain character on Pixiv, but when I tried to download, it gave me a 403 error. I tried using the request module with the src link to download it directly and it gave me the error. I tried opening a new tab with the src link and it gave me the same error. Is there a way to download a image from Pixiv? I was planning something a bit larger than just downloading a single image, but I am stuck in it. I did put the user-agent, as this thread suggested, but or it didn't work or I did something wrong.

This is the image I tried to download: https://www.pixiv.net/en/artworks/93284987.

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://i.pximg.net/img-original/img/2021/10/07/19/41/28/93284987_p0.jpg')

I skipped the code to get it here, but what happens is that I get the src link to the image, but I can't access the page to download it. I don't know if I need to actually go to the page, but I can't do anything with the src either. I tried several methods but nothing works. Can someone help me?

bringand1
  • 103
  • 1
  • 8
  • you'll want to click the download link. If the site doesn't provide one, you can send the request in your native code. If that doesn't work, you can screenshot the webelement. The 403 indicates that you can't link directly to the image like that. (It may need the request to include other things to complete...) Even if you could, Selenium would not have a DOM to interact with. – pcalkins Oct 08 '21 at 21:53
  • yeah, I don't think it has a download link. I tried clicking in the link of the a tag, but it just expanded the image and showed a new div with the full image and a src. What did you mean when you said to send a request in my native code? – bringand1 Oct 08 '21 at 22:42
  • I don't know python, but in Java you'd use FileUtils.copyURLToFile. You may need to send more than just the URL, though... the 403 error suggests you need more in your HTTP request than just the URL... (some kind of authentication maybe...) I would check with the site to see if they have an API available for this sort of thing. Else you might need to take a close look at the request when it's made in your browser. (or wireshark or something like that...) – pcalkins Oct 08 '21 at 22:49
  • yep, I probably need send more. A simple request gave me a 403 error. Do you know what I should put? When you say authentication, you mean a kind of login? If so, I already am logged in the script, it is one of the first things I do, because I can't even follow a path to the image without logging. – bringand1 Oct 08 '21 at 22:55
  • yep, you might need a session token or something like that... or possibly just a referrer... (some site's check referrer to see that the link is from the site.) – pcalkins Oct 08 '21 at 23:41
  • have you logged in ? if not try logging into application . – Jayanth Bala Oct 09 '21 at 07:10
  • 1
    I got it. I just needed to add the referer to the header, as pcalkins said. Thank you! – bringand1 Oct 10 '21 at 18:13

1 Answers1

0

Seems to download just fine without any headers for me.

import requests
import shutil

url = 'https://i.pximg.net/img-master/img/2021/10/07/19/41/28/93284987_p0_master1200.jpg'
response = requests.get(url, stream=True)
local_filename = url.split('/')[-1]
with open(local_filename, 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response
print(local_filename)

Alan
  • 71
  • 3