So, you can think of the contents of a request
as just text, right? Not only text, but text that accepts a relatively limited number of characters.
With that in mind, it all boils down on how to serialize "complex" data structures into text. I recently answered another question about files that is kinddddaaa similar idea.
If you have a bunch of key=value
parameters, you could use a simple "trick":
- Control names and values are escaped. Space characters are replaced by
+
, and then reserved characters are escaped as described in
[RFC1738], section 2.2: Non-alphanumeric characters are replaced by
%HH
, a percent sign and two hexadecimal digits representing the
ASCII code of the character. Line breaks are represented as "CR LF"
pairs (i.e., %0D%0A
).
- The control names/values are listed in the
order they appear in the document. The name is separated from the
value by
=
and name/value pairs are separated from each other by
&
.
So this data:
{a="foo", b="bar baz"}
Could be serialized into text following the specification above like: a=foo&b=bar+baz
That serialization format is identified as application/x-www-form-urlencoded
in the Content-type
request's header. That request's header is telling the server that receives it something like "Hey! The data that is coming in my body
is serialized following that convention that separates keys from values using the =
symbol and splits key/value pairs using &, changes whitespaces by +
... and so on"
(!) Very important: That is the format used by the requests
module on a POST
unless told otherwise.
Another format, which allows more flexibility (such as maintaining basic types or nesting structures) is JSON. That is the format that the Google server "wants", and in order to tell servers that the "text" contained in the request's body follows the Json standard (or convention), the Content-Type
header must be set to 'application/json'
.
What appears that your Google server was doing upon receiving a request
was checking the Content-type
header and if it wasn't Json, it gave you a 400
error to indicate "Oh, I don't understand this format... I want Json!"
That's why you have to specify the Json header.
There's an example comparing both formats here.
You can also see it more clearly since the latest versions of requests
module can do the JSON parsing for you. Since the JSON format has become so common, you can pass data provided in a Python structure (a dict
, for instance) through the json=
argument, and the module will do the json.dumps
and set the header for you. This also allows you to "introspect" a little how the body will look like (to see the differences maybe more clearly).
Check this out:
from requests import Request
data = {
'a': 'foo-1 baz',
'b': 5,
'c': [1, 2, 3],
'd': '6'
}
req = Request('POST', 'http://foo.bar', data=data)
prepped = req.prepare()
print("Normal headers: %s" % prepped.headers)
print("Normal body: %s" % prepped.body)
req = Request('POST', 'http://foo.bar', json=data)
prepped = req.prepare()
print("Json headers: %s" % prepped.headers)
print("Json body: %s" % prepped.body)
Outputs:
Normal headers: {'Content-Length': '31', 'Content-Type': 'application/x-www-form-urlencoded'}
Normal body: d=6&a=foo-1+baz&c=1&c=2&c=3&b=5
Json headers: {'Content-Length': '52', 'Content-Type': 'application/json'}
Json body: b'{"d": "6", "a": "foo-1 baz", "c": [1, 2, 3], "b": 5}'
See the difference? JSON is capable of making a difference between the strings foo-1
or 6
(using "
) as opposed to 5
being an integer, while the x-www-form
can't (see how the form encoding doesn't differentiate between the integer 5 or the string 6). Same with the list. By using the character [
, the server will be able to tell that c
is a list (and of integers)