get http raw (unparsed) response in http.client or python-requests

Question

I'm using Python to making HTTP requests. I need to raw HTTP response that looks like this:

HTTP/1.1 200 OK
Date: Mon, 19 Jul 2004 16:18:20 GMT
Server: Apache
Last-Modified: Sat, 10 Jul 2004 17:29:19 GMT
ETag: "1d0325-2470-40f0276f"
Accept-Ranges: bytes
Content-Length: 9328
Connection: close
Content-Type: text/html

<HTML>
<HEAD>
... the rest of the home page...

In python-requests I tried response.raw, but it's NOT raw HTTP response and it's just raw body.

Is there any way to achieve this goal without using socket?

P.S. I don't want to rebuild the raw response using parsed parts.

So what do you understand the 'raw response' to be? The header section? That's not available in raw form. — Martijn Pieters, Apr 22 '19 at 14:00
So you just need the HTTP headers, and not the body, correct? — Mark Stewart, Apr 22 '19 at 14:03
@MarkStewart No. I need to all of response in the mentioned format. — Juda Xovex, Apr 22 '19 at 14:04
@JudaXovex: that's the status line, the headers, and the body. The status line and headers are not available in raw form. — Martijn Pieters, Apr 22 '19 at 14:09
@MartijnPieters I'm thinking **HTTP raw response** means: status-line + CRFL + headers CRLF + CRLF + body. Using `socket` I can achieve this, but it's need to handle ssl, compression and many other things. P.S. Some of HTTP servers are sending LF+CR! P.S. [RFC 2616: section 6](https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html) — Juda Xovex, Apr 22 '19 at 14:26
@JudaXovex: servers that respond with `LFCR` instead of `CRLF` are broken, and the `http.client` / `urllib3` / `requests` stack is not able to parse their responses. — Martijn Pieters, Apr 22 '19 at 14:29
@JudaXovex: specifically, the `socket.readline()` call would return the line up to the `\s` (LF) linefeed, and the next line would start with the `\r` (CR)` character, which would cause it to be rejected as a header line. — Martijn Pieters, Apr 22 '19 at 14:34
@MartijnPieters _... stack is not able to parse their responses_. It's not true. Try it! `nc -lvc -p 1234 < http-response-with-only-lf.txt` — Juda Xovex, Apr 22 '19 at 14:56
@JudaXovex: You seem to be talking about a response that uses only LF bytes and not CR. That’s not the same thing as having CR and LF bytes swapped. — Martijn Pieters, Apr 22 '19 at 15:00
@MartijnPieters You are right, only for headers. But I mean an abnormal request at all. (Really thanks for your time!) — Juda Xovex, Apr 22 '19 at 15:08

Martijn Pieters · Accepted Answer · 2020-03-24T12:16:29.620

2

requests doesn't have the status line and headers in raw form. You never need these in raw form, a RFC compliant response can be reconstructed trivially from the data you do have. requests uses the urllib3 library as its basis, and that library, in turn, uses the Python standard library http.client module. That module doesn't give you the raw data either.

Instead, the status line and headers are parsed directly into the constituent parts, in http.client.HTTPResponse._read_status() and http.client.parse_headers() (the latter delegating to the email.parser.Parser().parsestr() method to parse the headers into a http.client.HTTPMessage() instance). Only the results of these parse operations are used.

You could try to wrap the urllib3 connection object (via the get_connection() hook implemented on a requests transport adapter). Connection objects have a .connect() method with supporting methods that create socket objects, and if you were to wrap those in a file-like object and then peeked at the .readline() call data, you could capture and store the raw data there.

However, if you are debugging a broken HTTP server, I'd not bother with trying to bend requests and its stack to your will here. Just use curl --include --raw <url> on the command line instead (with perhaps --verbose added).

Another option would be to use the http.client library directly, make the connection, send your outgoing headers with HTTPConnection.request(), then not use getresponse() but just read directly from conn.sock.

edited Mar 24 '20 at 12:16

answered Apr 22 '19 at 14:19

Martijn Pieters

1,048,767
296
4,058
3,343

1

You say _You never need these in raw form_, but I need, because I need to analyze HTTP response format. – Juda Xovex Apr 22 '19 at 14:32
@JudaXovex: then `response` may not be the library for your needs. Or any other library based on `http.client`. – Martijn Pieters Apr 22 '19 at 14:34
Is there any alternative library? – Juda Xovex Apr 22 '19 at 15:09
@JudaXovex: why not just use the [`curl` command line](https://unix.stackexchange.com/questions/29402/most-straightforward-way-of-getting-a-raw-unparsed-https-response)? – Martijn Pieters Apr 22 '19 at 15:13
There is too many requests and executing an external command is not too fast. – Juda Xovex Apr 22 '19 at 15:20
_but just read directly from conn.sock_. That's what I needed. Really thanks! – Juda Xovex Apr 22 '19 at 16:48

score 0 · Answer 2 · answered Aug 04 '23 at 12:14

0

response.raw does what you want

Answered here:

https://stackoverflow.com/a/56492298/1290627

answered Aug 04 '23 at 12:14

Alan Hamlett

3,160
1
23
23

get http raw (unparsed) response in http.client or python-requests

2 Answers2