1

I use scapy to sniff some packets and I get some HTTP response packets that are bytes I cannot parse.For example:

  b'HTTP/1.1 200 OK\r\nDate: Thu, 07 Dec 2017 02:44:18 GMT\r\nServer:Apache/2.4.18 (Ubuntu)\r\nLast-Modified: Tue, 14 Nov 2017 05:51:36 GMT\r\nETag: "2c39-55deafadf0ac0-gzip"\r\nAccept-Ranges: bytes\r\nVary: Accept-Encoding\r\nContent-Encoding: gzip\r\nContent-Length: 3186\r\nConnection: close\r\nContent-Type: text/html\r\n\r\n\x1f\x8b'

Is there a way to get the content part of this byte array so I can use gzip library to decode? I don't want to use request to get the HTTP response because I merely want to process the raw packet I had.

user6456568
  • 579
  • 9
  • 23

2 Answers2

4

There's no built-in way to parse a raw HTTP response like this and handle compression properly. I would use urllib3:

import urllib3

from io import BytesIO
from http.client import HTTPResponse

class BytesIOSocket:
    def __init__(self, content):
        self.handle = BytesIO(content)

    def makefile(self, mode):
        return self.handle

def response_from_bytes(data):
    sock = BytesIOSocket(data)

    response = HTTPResponse(sock)
    response.begin()

    return urllib3.HTTPResponse.from_httplib(response)

if __name__ == '__main__':
    import socket

    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect(('httpbin.org', 80))
    sock.send(b'GET /gzip HTTP/1.1\r\nHost: httpbin.org\r\n\r\n')

    raw_response = sock.recv(8192)

    response = response_from_bytes(raw_response)
    print(response.headers)
    print(response.data)
Blender
  • 289,723
  • 53
  • 439
  • 496
1

You can extract the value portion of the bytes with

response_bytes.decode('utf-8')

Then you can parse the returned information with Beautiful Soup for whatever part of it you want.

GaryMBloom
  • 5,350
  • 1
  • 24
  • 32
  • Thanks. Why I get an error? `UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 302: invalid start byte` – user6456568 Dec 07 '17 at 03:12
  • @user6456568 - Sorry, but I'm not the best person to help with decode issues. My apologies... – GaryMBloom Dec 07 '17 at 04:29
  • @user6456568 Because you're dealing with gzipped response. The body of the response is compressed so you can't just turn it into an utf8 string without first decompressing the body – hangonstack Sep 16 '22 at 20:48