This is a General Answer i wrote because i had a similar problem, your problem might be that the web-server asks you to add those cookies to your further requests. You've set your cookies to ''
, so they are discarded and your new cookies appended to the end of the headers as per the servers request.
What if we just use get():
import requests
import logging
import http.client as http_client
http_client.HTTPConnection.debuglevel = 1
#init logging
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True
requests.get("http://google.com", allow_redirects=False)
Here I've enabled logging so you see the requests as they're being made(logging code not shown in future examples). This yields the output:
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): google.com:80
send: b'GET / HTTP/1.1\r\nHost: google.com\r\nUser-Agent: python-requests/2.21.0\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'
...
As you see, requests initiates some headers even when we haven't told it to. Now what happens if we pass some headers to it in some format we want?
import requests
headers = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-US,en;q=0.9",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0"
}
requests.get("http://google.com", headers=headers, allow_redirects=False)
Here we expect "user-agent" to show up at the end of our request, however the output shows otherwise:
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): google.com:80
send: b'GET / HTTP/1.1\r\nHost: google.com\r\nuser-agent: Mozilla/5.0\r\naccept-encoding: gzip, deflate, br\r\naccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3\r\nConnection: keep-alive\r\naccept-language: en-US,en;q=0.9\r\nupgrade-insecure-requests: 1\r\n\r\n'
...
The "user-agent" shows up in the middle! What gives? Lets take a look at some source code from the library.
def __init__(self):
#: A case-insensitive dictionary of headers to be sent on each
#: :class:`Request <Request>` sent from this
#: :class:`Session <Session>`.
self.headers = default_headers()
...
When we initiate a Session
, the first thing it does is assign it default headers, and any additional headers provided by the user "indirectly"(through a function) will be appended to the default one.
This is a problem, as when you append two dicts(even OrderedDicts), the result retains the order of the original dict. We can see this in the above example where the "user-agent" attribute retained it's position of second in the dict.
This is the code for the appending process if you're interested:
def merge_setting(request_setting, session_setting, dict_class=OrderedDict):
"""Determines appropriate setting for a given request, taking into account
the explicit setting on that request, and the setting in the session. If a
setting is a dictionary, they will be merged together using `dict_class`
"""
if session_setting is None:
return request_setting
if request_setting is None:
return session_setting
# Bypass if not a dictionary (e.g. verify)
if not (
isinstance(session_setting, Mapping) and
isinstance(request_setting, Mapping)
):
return request_setting
merged_setting = dict_class(to_key_val_list(session_setting))
merged_setting.update(to_key_val_list(request_setting))
# Remove keys that are set to None. Extract keys first to avoid altering
# the dictionary during iteration.
none_keys = [k for (k, v) in merged_setting.items() if v is None]
for key in none_keys:
del merged_setting[key]
return merged_setting
So what's the fix?
You'll have to override the default header completely. The way i can think to do this is to use a Session
, and then replace the headers dict directly:
session = requests.Session()
headers = {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-US,en;q=0.9",
"cookie": "Cookie: Something",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0"
}
# session.cookies.add_cookie_header(session)
session.headers = headers
a = session.get("https://google.com/", allow_redirects=False)
Which yield the desired output, without the need for any OrderedDict
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): google.com:443
send: b'GET / HTTP/1.1\r\nHost: google.com\r\naccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3\r\naccept-encoding: gzip, deflate, br\r\naccept-language: en-US,en;q=0.9\r\ncookie: Cookie: Something\r\nupgrade-insecure-requests: 1\r\nuser-agent: Mozilla/5.0\r\n\r\n'
...
The above example proves that everything has stayed where it's supposed to be, even if you check response.request.headers
everything should be in order(at least for me it is)
P.S: I haven't bothered to check if using an OrderedDict makes a difference here, but if you have any issues still, try using one instead.