The requests response I am getting looks like gibberish and I'm trying to figure out if there is some security to the website I need to consider, or if I just need to do some encoding.
I tried using a few random encodings with r.encoding on the response object, to no success. I tried using chardet.detect on the response object and got 'none'.
Without headers, I get 403 forbidden. With the following headers, I sometimes get a good response and I sometimes get gibberish. I've tried this over two days. On Day 1, I consistently got gibberish from the same URLs. On day 2, some of the URLs that worked on day 1 were now providing gibberish. I don't know if something changed in those pages or what caused the change.
import requests
url = "https://www.rechem.ca/index.php?route=product/product&product_id=515"
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;'
'q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'referer': 'https://www.google.ca',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko'
}
r = requests.get(url, headers=headers)
print(r.content)
Here are the first few lines of the response:
b'\x0b\x9b9D\x14\xf3\x01P\x84\x0cs\xff\xf9\xfe\xb4\xff\xaeg\xfa\xf1\x7f\xaa\xfb\x82\xd9\xd8X\x18chA\xbb\tI\x9a\xcc\r\tm\xd8\xc7\x939\xd8\x07|6\xb2\xe4J2\xc4\xbd\xbd\xff\xff~\x96\xbb\xadc\xd5Z\x95\x02J\xf3H\x84\x163\x85\x01\xfe\xc5]\xce\x94s\xef}\xefC\xb2\xc0\x94d\x97-\xa3\xb3\x8c\xbaJ\xff\xff\x8b\x91=U#\xe7T\xad#5\xa7^\xa5U\x7f\xdd\xf8c\xd8~\xf5\xafk\xbf\xdc\x12Dx\x82\x1d\xb6L\xc6\xb4\xce\xf9\x0f*!\xb0\xab\x17\xc1\xc4\t\xf0\xa8m\xae/\xe7w>\xc0\xfd\xcd\x9fb\x07\x0bw\x073\xf8S-\xea\xd0h\x90\xbd\xba\xbf\xab\xd6R\x90\xa1\xb3xL3\x91\xcb\x9b0\x8fe\xfe\xe9\xe0\x18r\xda\xde\xf0\xdb\xbb\xc1\xfd\xcd\xf0O\xb5xG\xc4(W^\x02"\xbb\x85\xd4\x84\xd5X\x94\xb9e\x8d\xceSX\ne\x16\x0b\xb3\x0c6\xb4\x14\xea\xef\x11\x02\x96\xb7\x0b\x9f\xb7\x14\xe6x\x17p\xd3\xf5\xb9\xb1/Q\xd3R\x8a\x1a=\x83\x955\x01\xcb\x00?\xfaA\xd7<\x91\xe2\xdf\xf1\xa5\xa8Ch\xfd\xfb\xd1\xe8|>\'\x8e\xca\x9a\x9a\xa4\xc4\x11\'\xe7\xb5\x0e
Does that look salvagable, or like deliberate nonsense? If it is deliberate, are there any suggestions on how I can better replicate a browser experience with requests.