How to parse HTTP raw bytes and get the HTTP content in python?

Question

I use scapy to sniff some packets and I get some HTTP response packets that are bytes I cannot parse.For example:

  b'HTTP/1.1 200 OK\r\nDate: Thu, 07 Dec 2017 02:44:18 GMT\r\nServer:Apache/2.4.18 (Ubuntu)\r\nLast-Modified: Tue, 14 Nov 2017 05:51:36 GMT\r\nETag: "2c39-55deafadf0ac0-gzip"\r\nAccept-Ranges: bytes\r\nVary: Accept-Encoding\r\nContent-Encoding: gzip\r\nContent-Length: 3186\r\nConnection: close\r\nContent-Type: text/html\r\n\r\n\x1f\x8b'

Is there a way to get the content part of this byte array so I can use gzip library to decode? I don't want to use request to get the HTTP response because I merely want to process the raw packet I had.

score 4 · Accepted Answer · answered Dec 07 '17 at 03:52

4

There's no built-in way to parse a raw HTTP response like this and handle compression properly. I would use urllib3:

import urllib3

from io import BytesIO
from http.client import HTTPResponse

class BytesIOSocket:
    def __init__(self, content):
        self.handle = BytesIO(content)

    def makefile(self, mode):
        return self.handle

def response_from_bytes(data):
    sock = BytesIOSocket(data)

    response = HTTPResponse(sock)
    response.begin()

    return urllib3.HTTPResponse.from_httplib(response)

if __name__ == '__main__':
    import socket

    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect(('httpbin.org', 80))
    sock.send(b'GET /gzip HTTP/1.1\r\nHost: httpbin.org\r\n\r\n')

    raw_response = sock.recv(8192)

    response = response_from_bytes(raw_response)
    print(response.headers)
    print(response.data)

answered Dec 07 '17 at 03:52

Blender

289,723
53
439
496

Thank you very much! This is exactly what I need! – user6456568 Dec 07 '17 at 06:52
@Hi, I still have a question though. How to parse a HTTP Request raw bytes? – user6456568 Dec 20 '17 at 05:19
@user6456568: What do you mean? In my example code, `raw_response` is the raw HTTP response with a gzip-compressed body. – Blender Dec 20 '17 at 05:21
I have some raw bytes, and they are either HTTP request or response, I want to parse them both. – user6456568 Dec 20 '17 at 06:02
@user6456568: parsing HTTP requests is a different problem: https://stackoverflow.com/questions/39090366/how-to-parse-raw-http-request-in-python-3 – Blender Dec 20 '17 at 22:18

score 1 · Answer 2 · answered Dec 07 '17 at 03:03

1

You can extract the value portion of the bytes with

response_bytes.decode('utf-8')

Then you can parse the returned information with Beautiful Soup for whatever part of it you want.

answered Dec 07 '17 at 03:03

GaryMBloom

5,350
1
24
32

Thanks. Why I get an error? `UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 302: invalid start byte` – user6456568 Dec 07 '17 at 03:12
@user6456568 - Sorry, but I'm not the best person to help with decode issues. My apologies... – GaryMBloom Dec 07 '17 at 04:29
@user6456568 Because you're dealing with gzipped response. The body of the response is compressed so you can't just turn it into an utf8 string without first decompressing the body – hangonstack Sep 16 '22 at 20:48

How to parse HTTP raw bytes and get the HTTP content in python?

2 Answers2