6

I've been working with HTTP headers recently. I am parsing field and value from HTTP header requesrts based on the colon separated mandated by RFC. In python:

header_request_line.split(":")

However, this messes up if colons are allowed in the value fields. Consider:

User-Agent: Mozilla:4.0

which would be split into 3 strings, not 2 as I wanted.

Cheeso
  • 189,189
  • 101
  • 473
  • 713
jeffrey
  • 3,196
  • 7
  • 26
  • 44

3 Answers3

4

Yes. So you can do something like this (pseudo):

header = "User-Agent: Mozilla:4.0"
headerParts = header.split(":")

key = headerParts[0]
value = headerParts.substring(key.length).trim()

// or
value = headerParts.skip(1).join(":")

But you'll probably run into various issues when parsing headers from various servers, so why not use a library?

CodeCaster
  • 147,647
  • 23
  • 218
  • 272
  • What kind of libraries do you suggest? My input is a single header in string format. – jeffrey Nov 14 '14 at 22:21
  • 1
    I don't know the python libraries that well, so take a look at [Parse raw HTTP Headers](http://stackoverflow.com/questions/4685217/parse-raw-http-headers). – CodeCaster Nov 14 '14 at 22:24
  • Thanks, Brandon Rhodes has a perfect solution. Unfortunately, I need to know the order of the HTTP header fields, and his solution uses a dictionary. Any ideas on how to get the ordering? – jeffrey Nov 14 '14 at 23:52
2

Yes it can

In your example you might simply use split with maxsplit parameter specified like this:

header_request_line.split(":", 1)

It would produce the following result and would work despite the number of colons in the field value:

In [2]: 'User-Agent: Mozilla:4.0'.split(':', 1)
Out[2]: ['User-Agent', ' Mozilla:4.0']
Oleg Kuralenko
  • 11,003
  • 1
  • 30
  • 40
1

Per RFC 7230, the answer is Yes. enter image description here

The Header Value is a combination of {token, quoted-string, comment}, separated by delimiters. The delimiter may be a colon.

So a header like

User-Agent: Mozilla:4.0

has a value that consists of two tokens (Mozilla, 4.0) separated by a colon.

Nobody asked this specifically, but... in my opinion while colon is OK, and a quoted string is ok, it feels like poor style, to me, to use a JSON string as a header value.

My-Header: {"foo":"bar","prop2":12345} 

..probably would work ok, but it doesn't comply with the intent of sec. 3.2.6 of RFC7230. Specifically { " , : are all delimiters... and some of them are consecutive in this JSON. A generic parser of HTTP header values that conforms to RFC7230 wouldn't be happy with that value. If your system needs that, then a better idea may be to URL-encode that value.

My-Header: %7B%22foo%22%3A%22bar%22%2C%22prop2%22%3A12345%7D

But that will probably be overkill in most cases. Probably you will be safe to insert JSON as a HTTP Header value.

Community
  • 1
  • 1
Cheeso
  • 189,189
  • 101
  • 473
  • 713