13

I am looking for a library or function call in python or an associated library that would let me feed in a raw stream of text data representing an HTTP req/res and that would spit out that information is some sort of meaningful form like a dictionary or list. I do not want to use some built in class or create a bunch of new objects, in my program I am receiving in some raw data and that is just what I've got to work with. Is there already a solution out there for this, or do I have to write an HTTP parser myself?

Edit: Let me clarify what exactly I'm looking to do. I'm looking for something that would take a string like:

GET /index.html HTTP/1.1 \r\n
Host:www.stackoverflow.com \r\n
User-Agent:Firefox \r\n
etc.

And send me back something encapsulating the method, HTTP version, headers and all the rest.

Brandon Rhodes
  • 83,755
  • 16
  • 106
  • 147
themaestro
  • 13,750
  • 20
  • 56
  • 75
  • — here are two questions that I think might address this issue (which I have tried to address in detail there): http://stackoverflow.com/questions/2115410/does-python-have-a-module-for-parsing-http-requests-and-responses/ http://stackoverflow.com/questions/4685217/parse-raw-http-headers/ – Brandon Rhodes May 12 '11 at 19:05

3 Answers3

4

There is a pure python HTTP parser that is shipped as a fallback implementation for the C/Cython optimized implementation of the http-parser project.

Here is the pure python version:

Here the source of the C version and Cython wrapper:

ogrisel
  • 39,309
  • 12
  • 116
  • 125
1

http://docs.python.org/library/httplib.html I believe this is the library you are looking for. A little change in name for python 3 but otherwise good to go.

Gabriel
  • 18,322
  • 2
  • 37
  • 44
  • 3
    I looked at that but could not quite find what I needed. Correct me if I'm wrong, but doesn't that lib revolve around actually making/receiving requests? I don't want to make/receive any requests, I just want to look at raw data. Could you give an example of the method you believe would do this? – themaestro Jul 09 '10 at 18:55
  • 1
    Well the http request, when you recieve it contains the raw header data, and you use this library to create a header dictionary. This is what your post describes. If you are looking to recieve raw text data over a socket you might try http://docs.python.org/library/socket.html but you will be recreating a lot of wheel parts. Conversely if you are receiving the raw text and want a way to parse it into a valid request header you can try http://deron.meranda.us/python/httpheader/pydoc#-parse_token_or_quoted_string but I have not tried this myself. – Gabriel Jul 10 '10 at 06:06
1

I'd start by looking at WebOb. I think the cgi module in the standard library also has an HTTP parser.

Marius Gedminas
  • 11,010
  • 4
  • 41
  • 39
  • Sweet, webob.Request.accept handles this perfectly: http://pythonpaste.org/webob/reference.html#accept-headers – aehlke Feb 25 '11 at 16:30
  • @Wahnfrieden — I am confused, though, about how to get a raw HTTP request inside of a string, like is shown in the question, and turn it into a WebOb object. I do not see anything in your link that suggests that it is possible. Could you share how you turn HTTP request strings into WebOb objects? (Because I need to on one of my projects!) :) – Brandon Rhodes May 11 '11 at 12:32
  • @Brandon sorry I commented prematurely - WebOb parses the part of the header that I needed (just the value of the Accept header), but I don't know about the rest of it. – aehlke May 12 '11 at 18:50