can't find the right compression for this webpage (python requests.get)

Question

I can load this webpage in Google Chrome, but I can't access it via requests. Any idea what the compression problem is?

Code:

import requests


url = r'https://www.huffpost.com/entry/sean-hannity-gutless-tucker-carlson_n_60d5806ae4b0b6b5a164633a'
headers = {'Accept-Encoding':'gzip, deflate, compress, br, identity'}

r = requests.get(url, headers=headers)

Result:

ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check'))

@JoranBeasley perhaps `Received response with content-encoding: gzip`? — mkrieger1, Jun 27 '21 at 17:39

score 1 · Answer 1 · answered Jun 27 '21 at 17:51

Use a user agent that emulates a browser:

import requests

url = r'https://www.huffpost.com/entry/sean-hannity-gutless-tucker-carlson_n_60d5806ae4b0b6b5a164633a'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36"}

r = requests.get(url, headers=headers)

score 0 · Answer 2 · answered Jun 27 '21 at 18:34

0

You're getting a 403 Forbidden error, which you can see using requests.head. Use RJ's suggestion to defeat huffpost's robot blocking.

>>> requests.head(url)
<Response [403]>

answered Jun 27 '21 at 18:34

jcomeau_ictx

37,688
6
92
107

can't find the right compression for this webpage (python requests.get)

Code:

Result:

2 Answers2