0

I'm trying out C++ sockets for the first time, and I've hit my first obstacle. I've send some data to google using the send function (GET / HTTP/1.1\r\n\r\n), and now I'm trying to receive the response. My current code:

char buffer[256];
std::string result = "";

int resultSize = 0;
bool receive = true;
while (receive) {
    resultSize = recv(dataSocket, buffer, sizeof(buffer) - 1, 0);
    buffer[resultSize] = '\0'; // Add NULL terminating character to complete string
    result += buffer;

    for (int i = 0; i < resultSize; i++) {
        if (buffer[i] == '\0') {
            receive = false;
        }
    }
}

return result;

Using a buffer size of 256 to demonstrate the problem, which is that if the page contains more bytes than I'm receiving in my buffer, it doesn't receive everything on the first try. I've tried looping until the data contains a null terminator ('\0'), which doesn't seem to work. I've also tried checking for empty lines ('\r\n'), which doesn't work as well since there is an empty line between the headers and the HTML content of a page.

What I have noticed is that I could possibly use the Content-Length header to solve this issue. However, I would be unsure how to get that header, since it requires at least one recv call, and if there is a good, safe and efficient way to do it. I'm also not sure what to do when the response doesn't include the Content-Length header, since the program will then get stuck in an infinite loop.

So if there is a method that allows me to repeat recv until the end of a HTTP stream has been reached, I'd like to know about it.

If anyone could help me with this I'd appreciate it!

Qub1
  • 1,154
  • 2
  • 14
  • 31
  • "It doesn't receive everything on the first try". Why should it? Where does it say that? TCP can deliver you one byte at a time if it likes. NB you aren't checking for errors or end of stream here. And the end of an HTTP 1.1 response isn't defined by end of stream: there may be a Content-length header, or multi-parts, each with their own length. See RFC 2616. – user207421 Sep 30 '15 at 02:07
  • You should loop until `recv` returns 0. (Or -1, which means an error occurred. Or `SOCKET_ERROR` if on Windows) – user253751 Sep 30 '15 at 03:04
  • 3
    @immibis: The *correct* behavior is to stop reading when the HTTP response data tells you to stop reading. Read the response headers first (read until `\r\n\r\n` is reached), then parse the headers, and then read the rest of the response body as dictated by the headers, and stop reading only when you reach the end of the response as dictated by the headers, or when the server closes the connection, whichever is encountered first. – Remy Lebeau Sep 30 '15 at 03:13
  • @EJP Calm down, I never said I expected it to. And the whole question was about how I could detect the end of the stream. – Qub1 Sep 30 '15 at 09:29
  • @immibis This would work, however on the last loop, when it returns such a value (as there is no data left to retrieve) recv will wait 30 seconds before it returns 0, so that will cause too large delays. – Qub1 Sep 30 '15 at 09:30
  • @RemyLebeau Thanks, I'm going to try parsing the headers first and then the rest. Just one more question, what would be the correct response when the required headers are missing? And is a regex match efficient enough for this purpose? – Qub1 Sep 30 '15 at 09:30
  • 2
    @Qub1: Read [RFC 2616 Section 4 HTTP Message](http://tools.ietf.org/html/rfc2616#section-4), in particular [Section 4.2 Message Headers](http://tools.ietf.org/html/rfc2616#section-4.2) and [Section 4.4 Message Length](http://tools.ietf.org/html/rfc2616#section-4.4). 4.4 tells you exactly what headers to look for and how to process them. And no, a regex will not be good enough, as headers are case-insensitive, can be in any order, have extra whitespace surrounding them and their values, etc. You need a real parser. – Remy Lebeau Sep 30 '15 at 16:33
  • Related question: http://stackoverflow.com/questions/1011339/how-do-you-make-a-http-request-with-c – Sergey Vyacheslavovich Brunov Oct 01 '15 at 13:24
  • Related question: http://stackoverflow.com/questions/32883382/get-http-request-c-sockets-using-winsock-h – Sergey Vyacheslavovich Brunov Oct 01 '15 at 13:27

1 Answers1

1

The correct behavior is to stop reading when the HTTP response data tells you to stop reading. Read the response headers first (read until \r\n\r\n is reached), then parse the headers, and then read the rest of the response body as dictated by the headers, and stop reading only when you reach the end of the response as dictated by the headers, or when the server closes the connection, whichever is encountered first. – Remy Lebeau

Armali
  • 18,255
  • 14
  • 57
  • 171