3

I have observed a precipitous drop in performance (and dramatic increase in CPU consumption) on migrating from Beast.1.0.0-b66 (using Boost.1.64.0) to Boost.1.67.0.Beast (i.e. with Beast integrated into Boost). No doubt I've done something wrong, but I cannot imagine what.

What was:

typedef beast::http::request<beast::http::string_body> BeastHttpRequest;

is now:

namespace http = boost::beast::http;    
typedef http::request<http::string_body> BeastHttpRequest;

and what was:

beast::http::prepare(req);
beast::http::write(stream, req);

is now:

req.prepare_payload();
http::write(stream, req);

Of course, I also had to make numerous API changes. For example:

req.fields.replace(hdrName, hdrValue);

is now:

req.set(hdrName, hdrValue);

The application works correctly -- including both SSL handshaking and proxy negotiation -- but I have to fix the spike in CPU consumption and commensurate drop in performance. I wonder if anyone knows of something obvious that I've overlooked.

EDIT: I should have mentioned that I'm using a flat_buffer for the SSL stream.

I've had a chance to profile both the "before port" and "after port" performance data. Here is the "before port" (i.e. good performance) call chain:

- 34.61% HttpRequest::send
   - 32.02% beast::http::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&>, true, beast::http::string_body,
      - 31.90% beast::http::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&>, true, beast::http::string_bo
         - 20.77% beast::http::detail::write_preparation<true, beast::http::string_body, beast::http::basic_fields<std::allocator<char> > >::init
            - 18.25% beast::http::detail::write_fields<beast::basic_streambuf<std::allocator<char> >, beast::http::basic_fields<std::allocator<char> > >
               - 10.38% beast::write<beast::basic_streambuf<std::allocator<char> >, boost::basic_string_ref<char, std::char_traits<char> > >
                  - 10.20% beast::detail::write_dynabuf<beast::basic_streambuf<std::allocator<char> >, boost::basic_string_ref<char, std::char_traits<char> > >
                     + 3.66% beast::basic_streambuf<std::allocator<char> >::prepare
                     + 2.90% beast::basic_streambuf<std::allocator<char> >::commit
                     + 1.81% boost::lexical_cast<std::string, boost::basic_string_ref<char, std::char_traits<char> > >
                     + 1.47% boost::asio::buffer_copy<beast::basic_streambuf<std::allocator<char> >::mutable_buffers_type>
               - 7.51% beast::write<beast::basic_streambuf<std::allocator<char> >, char [3]>
                  - 7.47% beast::detail::write_dynabuf<beast::basic_streambuf<std::allocator<char> >, 3ul>
                     + 3.14% beast::basic_streambuf<std::allocator<char> >::prepare
                     + 2.84% beast::basic_streambuf<std::allocator<char> >::commit
                     + 1.32% boost::asio::buffer_copy<beast::basic_streambuf<std::allocator<char> >::mutable_buffers_type>
            + 2.04% beast::http::detail::write_start_line<beast::basic_streambuf<std::allocator<char> >, beast::http::basic_fields<std::allocator<char> > >
         - 9.60% beast::http::string_body::writer::write<beast::http::detail::writef0_lambda<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_serv
            - 9.57% beast::http::detail::writef0_lambda<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&>, beast:
               - 9.54% boost::asio::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&>, beast::detail::buffe
                  - 9.48% boost::asio::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&>, beast::detail::bu
                     - 8.55% boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >&>::write_some<boost::asio::detail::c
                        - 7.56% boost::asio::ssl::detail::io<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::ssl::detail::wr
                           - 6.98% boost::asio::write<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::mutable_buffers_1>
                              - 6.93% boost::asio::write<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >, boost::asio::mutable_buffers_1,
                                 - 6.69% boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> >::write_some<boost::asio::detail::consuming_buffer
                                    - 6.69% boost::asio::stream_socket_service<boost::asio::ip::tcp>::send<boost::asio::detail::consuming_buffers<boost::asio::const_buffer, boost::asio::mutable_buffers_1>
                                       - 6.67% boost::asio::detail::reactive_socket_service_base::send<boost::asio::detail::consuming_buffers<boost::asio::const_buffer, boost::asio::mutable_buffers_1> >
                                          - 6.51% boost::asio::detail::socket_ops::sync_send
                                             - 6.32% 0xec6d
                                                - 6.08% system_call_fastpath
                                                   - 6.07% sys_sendmsg
                                                      - 6.06% __sys_sendmsg
                                                         - 5.94% ___sys_sendmsg
                                                            + 5.84% sock_send

and here is the "after port" (i.e. poor performance) call chain:

- 53.77% HttpRequest::send
   - 53.32% boost::beast::http::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>, true, boost::beast::http::basic_string_body<char, std::char_traits<char>, std::allo
      - 53.30% boost::beast::http::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>, true, boost::beast::http::basic_string_body<char, std::char_traits<char>, std::a
         - 53.14% boost::beast::http::write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>, true, boost::beast::http::basic_string_body<char, std::char_traits<char>, std
            - 53.03% boost::beast::http::write_some<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>, true, boost::beast::http::basic_string_body<char, std::char_traits<ch
               - 52.95% boost::beast::http::detail::write_some_impl<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>, true, boost::beast::http::basic_string_body<char, std
                  - 36.58% boost::beast::http::serializer<true, boost::beast::http::basic_string_body<char, std::char_traits<char>, std::allocator<char> >, boost::beast::http::basic_fields<std::allocator<
                     - 35.28% boost::beast::http::serializer<true, boost::beast::http::basic_string_body<char, std::char_traits<char>, std::allocator<char> >, boost::beast::http::basic_fields<std::allocat
                        - 25.93% boost::beast::http::detail::write_some_lambda<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&> >::operator()<boost::beast::detail::buffers
                           - 25.82% boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp>&>::write_some<boost::beast::detail::buffers_ref<boost::beast::buffers_prefix_view<boost:
                              - 25.69% boost::asio::ssl::detail::io<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, boost::asio::ssl::detail::write_op<boost::beast::detail::buffers_ref<boost::beas
                                 - 22.20% boost::asio::write<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, boost::asio::mutable_buffer>
                                    - 22.13% boost::asio::write<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, boost::asio::mutable_buffer, boost::asio::detail::transfer_all_t>
                                       - 21.96% boost::asio::detail::write_buffer_sequence<boost::asio::basic_stream_socket<boost::asio::ip::tcp>, boost::asio::mutable_buffer, boost::asio::mutable_buffer
                                          - 21.47% boost::asio::basic_stream_socket<boost::asio::ip::tcp>::write_some<boost::asio::const_buffers_1>
                                             - 21.34% boost::asio::detail::reactive_socket_service_base::send<boost::asio::const_buffers_1>
                                                - 21.17% boost::asio::detail::socket_ops::sync_send
                                                   - 20.45% 0xec6d
                                                      - 19.63% system_call_fastpath
                                                         - 19.57% sys_sendmsg
                                                            - 19.52% __sys_sendmsg
                                                               - 19.16% ___sys_sendmsg
                                                                  + 18.90% sock_sendmsg
badfd
  • 63
  • 7
  • There is far too little information to go on here. Obviously changing the namespace qualification isn't going to hurt performance. Have you tried profiling both versions and comparing the bottle necks? – sehe Apr 25 '18 at 18:11
  • 1
    I agree -- not much to go on. Beast is just a small part of a rather large application. Requests are composed and sent, as shown. Reads are just as mundane `http::read(stream, buffer, res)`. I'll try to provide a trivial example that exhibits the problem. – badfd Apr 25 '18 at 21:13
  • I just edited the original post to include call chains for both `beast::http:write<>()` and `boost::beast::http::write<>()`. – badfd Apr 26 '18 at 21:57
  • What are these percentages? (Also, are you profiling a debug build?) – sehe Apr 26 '18 at 22:50
  • These percentages come from `perf`. I ran my (debug build) app with `perf record -g` for both cases, and these percentages represent the relative time spent sending HTTP requests. As you can see (bottom) the time spent by the poor performer in `sock_sendmsg()` is triple that of the good performer. – badfd Apr 26 '18 at 23:40

1 Answers1

6

It should be faster, not slower, as the HTTP algorithms were optimized. I think I know what is going on though. It would be helpful if you could compare the performance of the two versions when using regular sockets and not SSL. boost::asio::ssl::stream has a shortcoming which is that when writing buffer sequences of length greater than one, it writes to the socket for each buffer in the sequence instead of coalescing those encrypted buffers into a single write. This can have significant performance impact.

This really needs to be fixed in Boost.Asio but a workaround is that you can write your own stream wrapper whose write algorithm, when presented with buffer sequences whose length is greater than one, creates a new sequence of length one by using memcpy and dynamic allocation. I would also open an issue here: https://github.com/boostorg/asio/issues

Why the difference in the two versions of Beast? The older version of Beast allocated memory during serialization to hold a linear version of the message. The current version uses an algorithm which requires no memory allocations at all. It is faster in the normal case, but as you have discovered when working with ssl::stream types it can be slower.

See https://github.com/boostorg/beast/issues/1108

Vinnie Falco
  • 5,173
  • 28
  • 43
  • Thanks, Vinnie. I'll investigate that possibility. – badfd Apr 26 '18 at 23:42
  • I can try to implement the workaround, are you interested? – Vinnie Falco Apr 27 '18 at 03:05
  • Very interested! :-) That would be great, and very appreciated. – badfd Apr 27 '18 at 10:43
  • 1
    @badfd This pull request has a new class `flat_stream` which coalesces outgoing buffers, example of use is in the `ssl_stream` class. If you could continue the related discussion on the pull request itself, via comments, that would be very helpful: https://github.com/boostorg/beast/pull/1110 – Vinnie Falco Apr 28 '18 at 18:21
  • Thanks very much! Your `flat_stream` worked like a charm. I look forward to its merging into the main library; meanwhile, I'll patch my code. – badfd May 01 '18 at 18:31