1

The requests response I am getting looks like gibberish and I'm trying to figure out if there is some security to the website I need to consider, or if I just need to do some encoding.

I tried using a few random encodings with r.encoding on the response object, to no success. I tried using chardet.detect on the response object and got 'none'.

Without headers, I get 403 forbidden. With the following headers, I sometimes get a good response and I sometimes get gibberish. I've tried this over two days. On Day 1, I consistently got gibberish from the same URLs. On day 2, some of the URLs that worked on day 1 were now providing gibberish. I don't know if something changed in those pages or what caused the change.

import requests

url = "https://www.rechem.ca/index.php?route=product/product&product_id=515"

headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;'
              'q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.9',
    'referer': 'https://www.google.ca',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko'
}

r = requests.get(url, headers=headers)
print(r.content)

Here are the first few lines of the response:

b'\x0b\x9b9D\x14\xf3\x01P\x84\x0cs\xff\xf9\xfe\xb4\xff\xaeg\xfa\xf1\x7f\xaa\xfb\x82\xd9\xd8X\x18chA\xbb\tI\x9a\xcc\r\tm\xd8\xc7\x939\xd8\x07|6\xb2\xe4J2\xc4\xbd\xbd\xff\xff~\x96\xbb\xadc\xd5Z\x95\x02J\xf3H\x84\x163\x85\x01\xfe\xc5]\xce\x94s\xef}\xefC\xb2\xc0\x94d\x97-\xa3\xb3\x8c\xbaJ\xff\xff\x8b\x91=U#\xe7T\xad#5\xa7^\xa5U\x7f\xdd\xf8c\xd8~\xf5\xafk\xbf\xdc\x12Dx\x82\x1d\xb6L\xc6\xb4\xce\xf9\x0f*!\xb0\xab\x17\xc1\xc4\t\xf0\xa8m\xae/\xe7w>\xc0\xfd\xcd\x9fb\x07\x0bw\x073\xf8S-\xea\xd0h\x90\xbd\xba\xbf\xab\xd6R\x90\xa1\xb3xL3\x91\xcb\x9b0\x8fe\xfe\xe9\xe0\x18r\xda\xde\xf0\xdb\xbb\xc1\xfd\xcd\xf0O\xb5xG\xc4(W^\x02"\xbb\x85\xd4\x84\xd5X\x94\xb9e\x8d\xceSX\ne\x16\x0b\xb3\x0c6\xb4\x14\xea\xef\x11\x02\x96\xb7\x0b\x9f\xb7\x14\xe6x\x17p\xd3\xf5\xb9\xb1/Q\xd3R\x8a\x1a=\x83\x955\x01\xcb\x00?\xfaA\xd7<\x91\xe2\xdf\xf1\xa5\xa8Ch\xfd\xfb\xd1\xe8|>\'\x8e\xca\x9a\x9a\xa4\xc4\x11\'\xe7\xb5\x0e

Does that look salvagable, or like deliberate nonsense? If it is deliberate, are there any suggestions on how I can better replicate a browser experience with requests.

petezurich
  • 9,280
  • 9
  • 43
  • 57
melonfacedoom
  • 75
  • 2
  • 12
  • 2
    I don't get this response. Where specifically in the response do you see this? – Error - Syntactical Remorse Apr 26 '19 at 19:10
  • 1
    To investigate, it always helps to do a curl (https://stackoverflow.com/questions/356705/how-to-send-a-header-using-a-http-request-through-a-curl-call) – Snehaa Ganesan Apr 26 '19 at 19:10
  • 2
    I don't get this response either: `r.content; b' – C.Nivs Apr 26 '19 at 19:11
  • Okay, thanks to those that checked. Good to know that it must be some kind of security feature then. – melonfacedoom Apr 26 '19 at 19:14
  • If you are sure you used the same set of headers and the same url, but got different responses on different days, it is possible that it is a server side issue. Try investigating the next time you get a bad response. – Snehaa Ganesan Apr 26 '19 at 19:15
  • Does it run normal for you on an [online IDE](https://repl.it/repls/VainBriskLivecd)? – Error - Syntactical Remorse Apr 26 '19 at 19:15
  • Yes, it runs normal for me there. I consistently get a bad response when trying to run it from my machine, yet I can freely browse the website in Chrome. What is my request missing? – melonfacedoom Apr 26 '19 at 19:20
  • 'user-agent' seems to be the problem – Snehaa Ganesan Apr 26 '19 at 19:26
  • I got the same issue after changing the user-agent to the same one as my browser: "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36". This is a header I noticed in the browser request, but I don't know how to replicate it: `cookie: visid_incap_1319916=Vd6UUWMRT+6hRbJ2DjdKflyitFwAAAAAQUIPAAAAAAA31cidQQibV9InVxP0PFfm; language=en; currency=CAD; PHPSESSID=qpo1b3bktviv6am0q94kvvou23; incap_ses_1227_1319916=dn2AL7Iipk1jyfJnIC4HEQVLw1wAAAAAz3dwBVpcMBcdvrKnn8JNRg==` – melonfacedoom Apr 26 '19 at 19:29
  • And what do you get if you don't include user-agent in the header? – Snehaa Ganesan Apr 26 '19 at 19:30
  • I get 403 Forbidden – melonfacedoom Apr 26 '19 at 19:32
  • What platform are you on? If you are on *nix you could try `curl -sSL -D - https://www.rechem.ca/index.php?route=product/product&product_id=515 /dev/null`. When I run this on mac I don't get a `user-agent` header... – C.Nivs Apr 26 '19 at 19:38
  • only windows at work. I can try when i get home. – melonfacedoom Apr 26 '19 at 19:44
  • Where are you running the code? – Snehaa Ganesan Apr 26 '19 at 20:02

0 Answers0