3

In educational purposes I'm writing a HTTP server in C++. When receiving a request, how do I know when the client has finished sending headers? Is there an obligation that all headers must be sent in one shot? What if a client sends G, then after 5 seconds E, then T..? Should I wait a timeout and just close the connection if it takes too long? Should I start parsing as soon as I get the first bytes to know if the request is invalid?

I know there are a lot of libraries for this, I'm just reinventing the wheel to better understand how the Web works at different layers. And I can't find how they deal with exactly my question.

RocketR
  • 3,626
  • 2
  • 25
  • 38

4 Answers4

5

According to the HTTP 1.1 RFC (4.1):

    generic-message = start-line
                      *(message-header CRLF)
                      CRLF
                      [ message-body ]
    start-line      = Request-Line | Status-Line

There is an extra CRLF after the message header. So once you encounter the sequence CRLF -> CRLF, the body starts.

Concering timeout: You could start parsing once receiving characters (wait for CRLF so you know a header was completed) and once the request takes longer than 5 seconds or so, send back a 408 Request Timeout.

Community
  • 1
  • 1
Femaref
  • 60,705
  • 7
  • 138
  • 176
  • This doesn't answer the question. I did saw the RFC2616. What if I don't get double CRLF ever? What if the data sent to server is a whole non-sense without any CRLF? – RocketR Jul 10 '11 at 11:00
  • @RocketR, note that the request doesn't necessarily end with the headers. – Bruno Jul 10 '11 at 11:51
4

There are two parts to this answer.

Firstly, the issue of delay and time-out: you should deal with timeouts indeed, as it's generally not possibly to detect whether a TCP connection is broken. There is more on this topic in this question: TCP socket in Unix - notify server I am done sending

Secondly, the format of an HTTP request is defined (in RFC 2616, section 5) as follows:

    Request       = Request-Line              ; Section 5.1
                    *(( general-header        ; Section 4.5
                     | request-header         ; Section 5.3
                     | entity-header ) CRLF)  ; Section 7.1
                    CRLF
                    [ message-body ]          ; Section 4.3

Essentially, you get the request line (for example GET /index.html HTTP/1.1), followed by multiple header lines (without empty lines). Then, the list of headers ends with an empty line. All ends of lines are represented with CRLF ("\r\n").

In addition to this, some requests also have a body (typically those using POST or PUT). If the request has a message body, its length will be given either by the Content-Length header or using delimiters via chunked transfer encoding.

Community
  • 1
  • 1
Bruno
  • 119,590
  • 31
  • 270
  • 376
  • Perhaps, my question is too ambiguous. I know about the RFC, so what I was looking for is the first part. To sum up: I `recv` some portion of the headers and try to parse them. Then if I see they are not complete, I wait for more data and on timeout just close the connection, correct? – RocketR Jul 10 '11 at 11:34
  • @RocketR, Yes, that's correct. You need something to handle timeouts because you can only detect that the TCP connection you're reading from is broken (as opposed to inactive) by writing to it (but it's not your turn to try to write yet, until you've received the full request). – Bruno Jul 10 '11 at 11:43
  • As @Femaref was saying, you'd send a 408 response when there's timeout, or 400 if you detect the request is non-sense earlier, 411 if the client tries to send a body with the request, but without `Content-Length` or chunked encoding. I'm not sure what do if the client sends more headers than the server can handle (maybe 400). The server could send the error message before the client is done sending, and [the client should monitor this](http://tools.ietf.org/html/rfc2616#section-8.2.2). There's a note on [timeouts too](http://tools.ietf.org/html/rfc2616#section-8.1.4). – Bruno Jul 10 '11 at 12:06
3

The HTTP headers are separated from the body by \r\n\r\n, i.e. a double newline. That's the only thing you can rely upon.

Mat
  • 202,337
  • 40
  • 393
  • 406
1

I suggest you to read the HTTP protocol. Specifically, headers are bounded by double newline.

littleadv
  • 20,100
  • 2
  • 36
  • 50
  • HTTP/1.1: https://tools.ietf.org/html/rfc7230 ... HTTP/2: https://tools.ietf.org/html/rfc7540 – Andrew Aug 28 '17 at 14:14