Python: BaseHTTPRequestHandler - Read raw post

Question

How do I read the raw http post STRING. I've found several solutions for reading a parsed version of the post, however the project I'm working on submits a raw xml payload without a header. So I am trying to find a way to read the post data without it being parsed into a key => value array.

score 26 · Answer 1 · edited Oct 07 '21 at 07:34

26

self.rfile.read(int(self.headers.getheader('Content-Length'))) will return the raw HTTP POST data as a string.

Breaking it down:

The header 'Content-Length' specifies how many bytes the HTTP POST data contains.
self.headers.getheader('Content-Length') returns the content length (value of the header) as a string.
This has to be converted to an integer before passing as parameter to self.rfile.read(), so use the int() function.

~~Also, note that the header name is case sensitive so it has to be specified as 'Content-Length' only.~~

Edit: Apparently header field is not case sensitive (at least in Python 2.7.5) which I believe is the correct behaviour since https://www.rfc-editor.org/rfc/rfc2616 states:

Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive.

edited Oct 07 '21 at 07:34

Community

1
1

answered Jan 02 '14 at 09:34

Sindhuri Kuppasad

407
4
9

Please be more verbose, I have no idea what are you suggesting. – jb. Jan 02 '14 at 09:53
1

@jb: I added more details to the answer. Let me know if there is anything specific that still needs to be elaborated. – Sindhuri Kuppasad Jan 02 '14 at 10:49
@SindhuriKuppasad, the header name is not case-sensitive. The following statements both return the content length in my tests: `self.headers.getheader('content-length')` and `self.headers.getheader('content-LENGTH')` – famzah Nov 07 '15 at 06:38
@famzah, that's interesting. I cannot recall which version of Python I was using when I wrote this answer, but the case had mattered and that was the reason I put the answer here in the first place. I checked on 2.7.5 now and you're right, the case doesn't matter. – Sindhuri Kuppasad Nov 10 '15 at 05:56
2

In python3 it would be `self.headers.get('content-length')` – Amarghosh Apr 12 '18 at 13:28

smakateer · Accepted Answer · 2013-07-26T19:47:32.880

20

I think self.rfile.read(self.headers.getheader('content-length')) should return the raw data as a string. According to the docs directly inside the BaseHTTPRequestHandler class:

- rfile is a file object open for reading positioned at the
start of the optional input data part;

edited Jul 26 '13 at 19:47

answered Jul 26 '13 at 18:33

smakateer

556
4
5

2

After trying and doing some quick googling, this operations blocks execution for me as well as others. – kwolfe Jul 26 '13 at 18:44
3

Need to supply content length: data = self.rfile.read(int(self.headers.getheader('content-length'))) – kwolfe Jul 26 '13 at 18:53
6

Yes, sorry. It's blocking because the rfile object is a socket, and calling `read()` is basically saying 'read until there's nothing left to read' but there's more to read so long as the socket is open, so it hangs and waits for incoming content. Servers avoid the hanging by ALWAYS specifying HOW MUCH content to read. Sorry, I should have put that in in the first place. – smakateer Jul 26 '13 at 19:43
6

With Python 3.5 you need to use "get" instead of "getheader". – CyberFonic Nov 03 '17 at 05:23
What happens when there is no "content-length" header? Your server just crashes? – Jamie Marshall Sep 18 '18 at 20:20

score 4 · Answer 3 · answered Sep 12 '19 at 06:40

4

For python 3.7 the below worked for me:

rawData = (self.rfile.read(int(self.headers['content-length']))).decode('utf-8')

With the help of the other answers in this question and this and this. The last link actually contains the full solution.

answered Sep 12 '19 at 06:40

Ashiq

430
1
9
24

@JulesG.M. thats what I found in the last link I provided. Also utf-8 worked for the contents I was reading as raw data from the server side. If the server side is returning it encoded in any other format, that value will also need change for decoding. – Ashiq Jul 23 '21 at 09:09

score 2 · Answer 4 · answered Nov 02 '20 at 19:10

The read() method on the io.BufferedIOBase object reads until EOF. Not all browsers send the EOF character (source). Reading Content-Length bytes is a good solution. Using the read1() method also worked for me. It reads as much as possible in a single non-blocking API call.

Python: BaseHTTPRequestHandler - Read raw post

4 Answers4

Linked