0

I'm using python requests to send http requests to www.fredmeyer.com

I can't even get past an initial get request to this domain. doing a simple requests.get results in the connection hanging and never timing out. i've verified i have access to this domain and am able to run the request on my local machine. can anyone replicate

Stephen K
  • 697
  • 9
  • 26
  • The community can't really help you unless you show us the code that you've written, and how it's failing. – James McPherson May 05 '18 at 08:23
  • The code I've written is literally just `requests.get('https://www.fredmeyer.com')` How it's failing is that it isn't failing. When I execute `python script.py` it just hangs forever. – Stephen K May 05 '18 at 08:25
  • maybe that website doesn't want anyone instead web browsers to connect. Try adding into `get()` function argument `headers={'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36'}` – Andrii Maletskyi May 05 '18 at 08:26
  • This hangs too with curl but seems to work with a browser. – Steffen Ullrich May 05 '18 at 08:42

1 Answers1

3

The site seems to have some filtering enabled to prohibit bots or similar. The following HTTP request works currently with the site:

GET / HTTP/1.1
Host: www.fredmeyer.com
Connection: keep-alive
Accept: text/html
Accept-Encoding:

If the Connection header is removed or its value changed to close it will hang. If the (empty) Accept-Encoding header is missing it will also hang. If the Accept line is missing it will return 403 Forbidden.

In order to access this site with requests the following currently works for me:

import requests
headers = { 'Accept':'text/html', 'Accept-Encoding': '', 'User-Agent': None }
resp = requests.get('https://www.fredmeyer.com', headers=headers)
print(resp.text)

Note that the heuristics used by the site to detect bots might change, so this might stop working in the future.

Steffen Ullrich
  • 114,247
  • 10
  • 131
  • 172
  • Thanks - the site uses akamai for load balancing/content delivery and also uses it for traffic filtering. Seems they're using a very aggressive model to filter traffic.. – Stephen K May 05 '18 at 17:43