Python - Request being blocked by Cloudflare

Question

I am trying to log into a website. When I look at print(g.text) I am not getting back the web page I expect but instead a cloudflare page that says 'Checking your browser before accessing'

import requests
import time

s = requests.Session()
s.get('https://www.off---white.com/en/GB/')

headers = {'Referer': 'https://www.off---white.com/en/GB/login'}

payload = {
    'utf8':'✓',
    'authenticity_token':'',
    'spree_user[email]': 'EMAIL@gmail.com',
    'spree_user[password]': 'PASSWORD',
    'spree_user[remember_me]': '0',
    'commit': 'Login'
}

r = s.post('https://www.off---white.com/en/GB/login', data=payload, headers=headers)

print(r.status_code)

g = s.get('https://www.off---white.com/en/GB/account')

print(g.status_code)
print(g.text)

Why is this occurring when I have set the session?

score 43 · Answer 1 · answered Mar 27 '20 at 10:57

43

You might want to try this:

import cloudscraper

scraper = cloudscraper.create_scraper()  # returns a CloudScraper instance
# Or: scraper = cloudscraper.CloudScraper()  # CloudScraper inherits from requests.Session
print scraper.get("http://somesite.com").text  # => "<!DOCTYPE html><html><head>..."

It does not require Node.js dependency. All credits go to this pypi page

answered Mar 27 '20 at 10:57

Eiri

685
1
9
12

6

It seems that cloudscraper is not totally free. In my case cloudscraper showed error message, something like "cloudflare v2 detected. Not available in free version". So i think, one need to pay to access all cloudscraper features. – farynaa Oct 11 '21 at 10:30
It works just fine, I just checked it, make sure that the following packages are up to date: `Requests >= 2.9.2` `Requests_toolbelt >= 0.9.1` as described here https://pypi.org/project/cloudscraper/ under the dependencies heading. I hope this helps! – Eiri Mar 21 '22 at 13:47

score 12 · Accepted Answer · answered Mar 03 '18 at 19:13

12

This is due to the fact that the page uses Cloudflare's anti-bot page (or IUAM).
Bypassing this check is quite difficult to solve on your own, since Cloudflare changes their techniques periodically. Currently, they check if the client supports JavaScript, which can be spoofed.
I would recommend using the cfscrape module for bypassing this.
To install it, use pip install cfscrape. You'll also need to install Node.js.
You can pass a requests session into create_scraper() like so:

session = requests.Session()
session.headers = ...
scraper = cfscrape.create_scraper(sess=session)

answered Mar 03 '18 at 19:13

Jeremiah

623
6
13

@Pthomas did you implement? Do you want to share your experiments? – tolgabuyuktanir Nov 21 '18 at 12:51
@jeremiah i am trying it now. and it is raising some exception [here](https://github.com/Anorov/cloudflare-scrape/issues/232) – tbhaxor May 06 '19 at 08:25

score 5 · Answer 3 · answered May 07 '21 at 09:30

5

I had the same problem because they implemented cloudfare in the api, I solved it this way

import cloudscraper
import json
scraper = cloudscraper.create_scraper()
r = scraper.get("MY API").text 
y = json.loads(r)
print (y)

answered May 07 '21 at 09:30

Alvaro G

61
1
3

whoami · Answer 4 · 2022-01-14T04:48:35.147

4

curl and hx avoid this problem. But how? I found, they work by default with HTTP/2. But requests library used only HTTP/1.1.

So, for tests I installed httpx with h2 python library to support HTTP/2 requests) and it works if I do: httpx --http2 'https://some.url'.

So, the solution is to use a library that supports http2. For example httpx with h2

It's not a complete solution, since it won't help to solve Cloudflare's anti-bot ("I'm Under Attack Mode", or IUAM) challenge

edited Jan 14 '22 at 04:48

answered Jan 14 '22 at 04:26

whoami

41
4

Actually, `httpx` can succeed even with just 1.1, without a need for 2. I suspect this could be due to a difference in ciphers used by `requests` and `httpx`. – Asclepius Dec 08 '22 at 04:31

score 4 · Answer 5 · answered Mar 09 '22 at 05:21

4

You can scrape any Cloudflare protected page by using this tool. Node.js is mandatory in order for the code to work correctly.

Download Node from this link https://nodejs.org/en/

import cfscrape #pip install cfscrape

scraper = cfscrape.create_scraper()
res = scraper.get("https://www.example.com").text
print(res)

answered Mar 09 '22 at 05:21

Praveen Kumar

849
8
8

How can I integrate this with scrapy? – Baraa Zaid Jul 09 '22 at 19:21
That's a great tool! I works perfectly to get the page behind CloudFlare security. – Tajs Feb 04 '23 at 07:41

Python - Request being blocked by Cloudflare

5 Answers5

Linked