0

I'm learning about python's request library so that I can automatically download some images through their links. But the images that I'm trying to download are behind Cloudflare, and so I get ERROR 1020 Access Denied

Here's my code

import requests
from bs4 import BeautifulSoup

# -------------------------------------------------------------------------------------------------------

response = requests.get("https://main_link").text
soup = BeautifulSoup( response , 'html.parser')

for i, link in enumerate(soup.find_all('img')):         # getting all image elements
    l = link.get('src')                                 # image link -> https://link/link/image.jpg
    img_data = requests.get(l).content
    with open(f'Test{i}.png', 'wb') as f:
        f.write(img_data)
    

I looked at some resources like this StackOverflow question which says to use cfscrape And this is my code:

import requests
import cfscrape
from bs4 import BeautifulSoup

# ------------------------------------------------------------------------------------------------------
scraper = cfscrape.create_scraper()  

response = scraper.get("https://main_link").text
soup = BeautifulSoup( response , 'html.parser')

for i, link in enumerate(soup.find_all('img')):
    l = link.get('src') # https://link/link/image.jpg
    img_data = scraper.get(l).content
    with open(f'Test{i}.png', 'wb') as f:
        f.write(img_data)
    

But I still get the 1020 ERROR

I even used the cloudscraper library that too does not work.

I've looked at other resources but can't seem to understand what to do. Any help is appreciated

nasc
  • 289
  • 3
  • 16
  • Cloudflare is there for the very reason you're not getting the images you want. I doubt anyone here will show you how to bypass that. – baduker May 20 '21 at 10:38
  • @baduker is it not possible or is it illegal? – nasc May 20 '21 at 10:47
  • 1
    You didn't share the URL but I'd guess that what you're trying to do might be against the terms of service. Also, cloudflare is a million dollar bussiness and *as of September 2020, the company claims to block "an average of 72 billion threats per day, including some of the largest DDoS attacks in history."*. So good luck bypassing that. :] – baduker May 20 '21 at 10:53
  • T_T I'm not skilled enough to be a hacker so I'll give up then. I just wanted to download some images because 'right-click' and 'save image' was tedious and took a lot longer. I just wanted to make software which on giving loads of images to, automatically make a slideshow animation. – nasc May 20 '21 at 11:17
  • 1
    You can try explicitly setting your User-Agent header to a common browser one and Origin header to the origin for the website you are scraping. Check out the headers your browser uses when downloading the image. You can use the network tab in developer tools in chrome. Try adding those to your request. – Lucas Scott May 20 '21 at 15:33
  • oh cool I'll try that – nasc May 20 '21 at 16:14

0 Answers0