Why recv blocks before receiving all Content-Length?

Question

I'm trying to build an http server using c++. and so among the conditions based in which i decide how to extract the body entity, is if there's a content length present? , here's a minimal code on how i extract body using Content-Length :

req_t *Webserver::_recv(int client_fd, bool *closed)
{
    string req;
    static string rest;
    // string extracted_req;
    char buff[1024];

    // while (true) {
    // std::cout << "client_fd: " << client_fd << std::endl;
    int n = recv(client_fd, buff, 1024, 0);
    // std::cout << "n: " << n << std::endl;
    if (n == -1)
    {
        _set_error_code("500", "Internal Server Error");
        return NULL;
    }
    if (n == 0)
    {
        *closed = true;
        return NULL;
    }
    buff[n] = '\0';
    req += buff;
    req_t *extracted_req = _extract_req(client_fd, req, rest, closed);
    return extracted_req;
}


 ...
 else if (headers.find("Content-Length") != string::npos) {
    string body = extract_body_len(client_fd, rest_of_req, content_length);
}

req_t is a simple struct that contains three strings status_line, headers, body.

req_t *Webserver::_extract_req(int client_fd, const string &req, string &rest, bool *closed)
{

    req_t *ret;
    try
    {
        ret = new req_t;
    }
    catch (std::bad_alloc &e)
    {
        std::cerr << "\033[1;31mError:\033[0m " << e.what() << std::endl;
        exit(1);
    }

    string status_line = req.substr(0, req.find("\r\n"));
    string headers = req.substr(req.find("\r\n") + 2, req.find("\r\n\r\n") - req.find("\r\n") - 2);
    rest = req.substr(req.find("\r\n\r\n") + 4, req.size() - req.find("\r\n\r\n") - 4);

    ret->status_line = status_line;
    ret->headers = headers;

    // if method is get request body is empty
    
    // if the header contains a content-length, extract number of buytes for body;
    if (headers.find("Content-Length") != string::npos)
    {
        long long content_length = _get_content_len(headers);
        if (content_length == -1)
        {
            _set_error_code("400", "Bad Request");
            return NULL;
        }
        // substracting the length of the body from the length of the request 
        ret->body = _extract_body_len(client_fd, rest, content_length, closed);
        // if body is not complete, return an error
   ...

string extract_body_len(int client_fd, string& rest, unsigned long long len) {
    string body;
    unsigned long long total = 0;
    body = rest;
    // starting total with first bytes of body 
    total += rest.size();
    // if we have it all that's it
    if (total >= len) {
        body = rest.substr(0, len);
        rest = rest.substr(len);
        return body;
    }
     else
    {
        while (total < len)
        {
            char buf[1024];
            int ret = recv(client_fd, buf, 1024, 0);
            // after a lot of debugging , i've noticed that recv starts to read less than 1024 only when total is closer to len, so i added this condition naively.
            if (ret != 1024)
            {
               
                if ((total + ret) >= len)
                {
                    body += string(buf).substr(0, len - total);
                    rest = string(buf).substr(len - total);
                    break;
                }
            }
            if (ret == 0)
            {
                if (total == len)
                {
                    rest = "";
                    break;
                }
                // client closed connection and it's still incomplete: 400
                else
                {
                    res->status_code = "400";
                    res->status_message = "Bad Request";
                    return NULL;
                }
            }
            else if (ret == -1)
            {
                res->status_code = "500";
                res->status_message = "Internal Server Error";
                return body;
            }
            total += ret;
            body += string(buf, ret);
        }
    }
    return body;
}

Now, The problem is i've tested requests with varying sized body entities(8MB, 1.9MB, 31 MB) and all the time i never receive the whole body (as per content-length), the pattern is like the following:

recv keeps reading all 1024 bytes until total gets closer to len then it starts reading smaller numbers. until the difference between total and len is around 400...600 bytes then recv blocks at some point (there's nothing more to read) before total == len.

That really confused me, i tried with different api clients (postman, insonomia) but the same results, i doubted maybe Content-Length isn't that accurate but it obviously should be, what do you think is the problem , why am i receiving or reading less than Content-Length ?

I cannot see from your code how do you set ` content_length` — Steffen Ullrich, Nov 25 '22 at 13:18
i extracted the `content_length` at some point so it's guaranteed to be set — interesting, Nov 25 '22 at 13:19
Unfortunately the code you provide so far is not sufficient to fully understand what you are doing. See [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). — Steffen Ullrich, Nov 25 '22 at 13:23
@SteffenUllrich i updated my code with all the functions that i'm calling to get the request — interesting, Nov 25 '22 at 13:28

score 0 · Answer 1 · answered Nov 25 '22 at 14:07

int n = recv(client_fd, buff, 1024, 0);

The above code appears to assume that this recv call returns only the header portion of the HTTP request. Not one byte more, not one byte less.

Unfortunately, you will not find anything in your textbook on network programming that gives you any such guarantee, like that, whatsoever.

Your only guarantee (presuming that there is no socket-level error), is that recv() will return a value between 1 and 1024, representing however many bytes were already received on the socket, or arrived in the first packet that it blocked and waited for.

Using an example of a completely made up HTTP request that looks something like this:

POST /cgi-bin/upload.cgi HTTP/1.0<CR><LF>
Host: www.example.com<CR><LF>
Content-Type: application/octet-stream<CR><LF>
Content-Length: 4000<CR><LF>
<CR><LF>
[4000 octets follow]

When your web browser, or a simulated browser, sends this request this recv call can return any value between 1 and 1024 (excluding the case of network errors).

This means that this recv call can cough up anything between:

a return value of 1, and placing just the letter "P" into buff.
a return value of 1024, and placing the entire HTTP header, plus as much of the initial part of the HTTP content portion of the request into the buffer that's needed to produce 1024 bytes total.

The shown logic is completely incapable of correctly handling all of these possibilities, and that's why it fails. It will need to be reimplemented, pretty much from scratch, using the correct logic.

good point, the first call of `recv` may not get all the http header i surely need to take care of that, but so far though, all the http requests i've been sending with api clients, get the request line and headers in the first `recv`, the problem which i posted this question upon is in the function `_extract_body_len` that is the next calls to recv never receives a total of bytes that is equal to content_length resulting in `recv` blocking at some point — interesting, Nov 25 '22 at 14:15
The reasons for that seem quite obvious. Because in those cases the first part of the body was swallowed by by that `recv` call, and so those bytes get completely lost. `_extract_body_len` still gets teed up to receive the original number of bytes, for the body. But some of those bytes were already read, initially, by the broken logic that reads the headers. And the logic in `_extract_body_len` has several flaws that pile up on top of this, which will also need to be dealt with, at some point. Like I wrote, the overall logic needs to be rewritten from the ground up. — Sam Varshavchik, Nov 25 '22 at 14:53
See my [previous answers](https://stackoverflow.com/search?q=user%3A65863+http+pseudo) on the topic of reading http messages. — Remy Lebeau, Nov 26 '22 at 21:42

Why recv blocks before receiving all Content-Length?

1 Answers1