3

I am trying to parse HTTPS response using boost::beast::http::parser. My parser is defined like this:

boost::beast::http::parser<false, boost::beast::http::string_body> response_parser;

And the callback for async read is like this:

void AsyncHttpsRequest::on_response_read(const boost::system::error_code &error_code, uint32_t bytes_transferred)
{
    if (bytes_transferred > 0)
    {
        response_parser.put(boost::asio::buffer(data_buffer, bytes_transferred), http_error_code);
        std::cout << "Parser status: " << http_error_code.message() << std::endl;
        std::cout << "Read " << bytes_transferred << " bytes of HTTPS response" << std::endl;
        std::cout << std::string(data_buffer, bytes_transferred) << std::endl;
    }
    if (error_code)
    {
        std::cout << "Error during HTTPS response read: " << error_code.message() << std::endl;
        callback(error_code, response_parser.get());
    }
    else
    {
        if (response_parser.is_done())
        {
            callback(error_code, response_parser.get());
        }
        else
        {
            std::cout << "Response is not yet finished, reading more" << std::endl;
            read_response();
        }
    }
}

Everything works fine when response has no body, response_parser.is_done() returns true. But when response contains a body it always returns false even if body is fully read. Response also has a Content-Length header which matches the number of bytes in the body, so there is no problem.

Boost docs say that response_parser.is_done() should return true if The semantics of the message indicate a body is expected, and the entire body was parsed.

When I send a request using Connection: keep-alive I am stuck on reading response, because server has nothing left to send and response_parser is not yet done. When I use Connection: close my finish callback is invoked, but boost::beast::http::message parsed has no body inside. However my logging into stdout shows that there is body and it is fully read.

What do I need to do to make boost::beast::http::parser recognize body end and return true on is_done() when the number of bytes read from the body becomes equal to Content-Length?

alexozornin
  • 33
  • 1
  • 4

1 Answers1

2

Your expectations are right.

Background, Details And Caveats:

You can observe that it does work:

Live On Coliru

#include <boost/beast/http.hpp>
#include <iostream>
#include <iomanip>
#include <random>
using boost::system::error_code;
namespace http = boost::beast::http;

int main() {
    std::mt19937 prng { std::random_device{}() };
    std::uniform_int_distribution<size_t> packet_size { 1, 372 };

    std::string const response = 
"HTTP/1.1 200 OK\r\n"
"Age: 207498\r\n"
"Cache-Control: max-age=604800\r\n"
"Content-Type: text/html; charset=UTF-8\r\n"
"Date: Sat, 20 Mar 2021 23:24:40 GMT\r\n"
"Etag: \"3147526947+ident\"\r\n"
"Expires: Sat, 27 Mar 2021 23:24:40 GMT\r\n"
"Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT\r\n"
"Server: ECS (bsa/EB15)\r\n"
"Vary: Accept-Encoding\r\n"
"X-Cache: HIT\r\n"
"Content-Length: 1256\r\n"
"\r\n"
"<!doctype html>\n<html>\n<head>\n    <title>Example Domain</title>\n\n    <meta charset=\"utf-8\" />\n    <meta http-equiv=\"Content-type\" content=\"text/html; charset=utf-8\" />\n    <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n    <style type=\"text/css\">\n    body {\n        background-color: #f0f0f2;\n        margin: 0;\n        padding: 0;\n        font-family: -apple-system, system-ui, BlinkMacSystemFont, \"Segoe UI\", \"Open Sans\", \"Helvetica Neue\", Helvetica, Arial, sans-serif;\n        \n    }\n    div {\n        width: 600px;\n        margin: 5em auto;\n        padding: 2em;\n        background-color: #fdfdff;\n        border-radius: 0.5em;\n        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);\n    }\n    a:link, a:visited {\n        color: #38488f;\n        text-decoration: none;\n    }\n    @media (max-width: 700px) {\n        div {\n            margin: 0 auto;\n            width: auto;\n        }\n    }\n    </style>    \n</head>\n\n<body>\n<div>\n    <h1>Example Domain</h1>\n    <p>This domain is for use in illustrative examples in documents. You may use this\n    domain in literature without prior coordination or asking for permission.</p>\n    <p><a href=\"https://www.iana.org/domains/example\">More information...</a></p>\n</div>\n</body>\n</html>\n";

    std::string const input = response + response;
    std::string_view emulated_stream = input;

    error_code ec;
    while (not emulated_stream.empty()) {
        std::cout << "== Emulated stream of " << emulated_stream.size()
                  << " remaining" << std::endl;

        http::parser<false, http::string_body> response_parser;

        while (not (ec or response_parser.is_done() or emulated_stream.empty())) {
            auto next     = std::min(packet_size(prng), emulated_stream.size());
            auto consumed = response_parser.put(
                boost::asio::buffer(emulated_stream.data(), next), ec);

            std::cout << "Consumed " << consumed << std::boolalpha
                      << "\tHeaders done:" << response_parser.is_header_done()
                      << "\tDone:" << response_parser.is_done()
                      << "\tChunked:" << response_parser.chunked()
                      << "\t" << ec.message() << std::endl;

            if (ec == http::error::need_more)
                ec.clear();

            emulated_stream.remove_prefix(consumed);
        }

        auto res = response_parser.release();

        std::cout << "== Content length " << res["Content-Length"] << " and body "
                  << res.body().length() << std::endl;
        std::cout << "== Headers: " << res.base() << std::endl;
    }

    std::cout << "== Stream depleted " << ec.message() << std::endl;
}

Prints e.g.

== Emulated stream of 3182 remaining
Consumed 101    Headers done:false  Done:false  Chunked:false   need more
Consumed 0  Headers done:false  Done:false  Chunked:false   need more
Consumed 0  Headers done:false  Done:false  Chunked:false   need more
Consumed 0  Headers done:false  Done:false  Chunked:false   need more
Consumed 0  Headers done:false  Done:false  Chunked:false   need more
Consumed 234    Headers done:true   Done:false  Chunked:false   Success
Consumed 305    Headers done:true   Done:false  Chunked:false   Success
Consumed 326    Headers done:true   Done:false  Chunked:false   Success
Consumed 265    Headers done:true   Done:false  Chunked:false   Success
Consumed 216    Headers done:true   Done:false  Chunked:false   Success
Consumed 144    Headers done:true   Done:true   Chunked:false   Success
== Content length 1256 and body 1256
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256

== Emulated stream of 1591 remaining
Consumed 204    Headers done:false  Done:false  Chunked:false   need more
Consumed 0  Headers done:false  Done:false  Chunked:false   need more
Consumed 0  Headers done:false  Done:false  Chunked:false   need more
Consumed 131    Headers done:true   Done:false  Chunked:false   Success
Consumed 355    Headers done:true   Done:false  Chunked:false   Success
Consumed 137    Headers done:true   Done:false  Chunked:false   Success
Consumed 139    Headers done:true   Done:false  Chunked:false   Success
Consumed 89 Headers done:true   Done:false  Chunked:false   Success
Consumed 87 Headers done:true   Done:false  Chunked:false   Success
Consumed 66 Headers done:true   Done:false  Chunked:false   Success
Consumed 355    Headers done:true   Done:false  Chunked:false   Success
Consumed 28 Headers done:true   Done:true   Chunked:false   Success
== Content length 1256 and body 1256
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256

== Stream depleted Success

Perhaps

  • your stream content is not actually valid HTTP

  • your response doesn't have a content-length header at all. In this case, after headers are done parsing, the value of need_eof() will be true:

    Depending on the contents of the header, the parser may require and end of file notification to know where the end of the body lies. If this function returns true it will be necessary to call put_eof when there will never be additional data from the input.

  • your packet sizes are too small. You can see this effect if you reduce the packet size distribution to an extreme like:

     std::uniform_int_distribution<size_t> packet_size { 1, 3 };
    

    This will lead to no content being ever consumed. Docs:

    In some cases there may be an insufficient number of octets in the input buffer in order to make forward progress. This is indicated by the code error::need_more. When this happens, the caller should place additional bytes into the buffer sequence and call put again. The error code error::need_more is special. When this error is returned, a subsequent call to put may succeed if the buffers have been updated

    In your real code you would not keep retrying with small amounts because the buffer would only accumulate and eventually satisfy the requirements to make progress.

See also

BONUS: Simplify!

The good news is that you don't often need to use anything like this complicated. In most situations you'll just be able to http::read or http::async_read directly into a response object.

This will do the whole dance with the parser under the hood without you bothering about details:

Live On Coliru

boost::beast::flat_buffer buf;
boost::system::error_code ec;
for (http::response<http::string_body> res; !ec && read(pipe, buf, res, ec); res.clear()) {
    std::cout << "== Content length " << res["Content-Length"] << " and body "
              << res.body().length() << std::endl;
    std::cout << "== Headers: " << res.base() << std::endl;
}

std::cout << "== Stream depleted " << ec.message() << "\n" << std::endl;

That's all. Still prints:

== Content length 1256 and body 1256
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256

== Content length 1256 and body 2512
== Headers: HTTP/1.1 200 OK
Age: 207498
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 20 Mar 2021 23:24:40 GMT
Etag: "3147526947+ident"
Expires: Sat, 27 Mar 2021 23:24:40 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (bsa/EB15)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256

== Stream depleted end of stream
sehe
  • 374,641
  • 47
  • 450
  • 633
  • 1
    Thank you for your great answer! I tried the version with `boost::beast::http::async_read` and managed to make it work fine! However, I still don't completely understand why my "complicated" version didn't work. I double-checked those three possible error cases you mentioned and I can claim that my HTTP response was valid, it did have a valid `Content-Length` header and input buffer size was sufficient. – alexozornin Mar 25 '21 at 06:10
  • @alexozornin if you can show a working reproducer, I'll be happy to look at it. It would be good to find when there's an actual bug in the library, as well. – sehe Mar 25 '21 at 12:34
  • Please also mention `eager` mode that could be an issue when disabled. – Mariusz Jaskółka May 16 '22 at 13:05