1

I'm currently trying to create a http server using Boost.Asio, I made it like this HTTP Server 3.

Currently I just read the Request and always return an OK Message. So nothing special or time consuming.

The Problem I come across is, running the Server with 12 Threads (16 cores @ 2.53GHz), the server handles arround 200-300 requests per second.

I did the same in C# using HttpListener, running with 12 Threads, it handles arround 5000-7000 requests.

What the heck is Boost.Asio doing?

Using Instrumentation Profiling with Visual Studio get following "Functions With Most Individual Work":

Name                         Exclusive Time %
GetQueuedCompletionStatus               44,46
std::_Lockit::_Lockit                   14,54
std::_Container_base12::_Orphan_all      3,46
std::_Iterator_base12::~_Iterator_base12 2,06

Edit 1:

if (!err) {
  //Add data to client request
  if(client_request_.empty())
    client_request_ = std::string(client_buffer_.data(), bytes_transferred);
  else
    client_request_ += std::string(client_buffer_.data(), bytes_transferred);
  //Check if headers complete
  client_headerEnd_ = client_request_.find("\r\n\r\n");
  if(client_headerEnd_ == std::string::npos) {
    //Headers not yet complete, read again
    client_socket_.async_read_some(boost::asio::buffer(client_buffer_),
        boost::bind(&session::handle_client_read_headers, shared_from_this(),
          boost::asio::placeholders::error,
          boost::asio::placeholders::bytes_transferred)); 
  } else { 
    //Search Cookie 
    std::string::size_type loc=client_request_.find("Cookie"); 
    if(loc != std::string::npos) {
    //Found Cookie
    std::string::size_type locend=client_request_.find_first_of("\r\n", loc);
    if(locend != std::string::npos) {
      std::string lCookie = client_request_.substr(loc, (locend-loc));            loc = lCookie.find(": ");           if(loc != std::string::npos) {
        std::string sCookies = lCookie.substr(loc+2);
        std::vector<std::string> vCookies;
        boost::split(vCookies, sCookies, boost::is_any_of(";"));
        for (std::size_t i = 0; i < vCookies.size(); ++i) {
          std::vector<std::string> vCookie;
          boost::split(vCookie, vCookies[i], boost::is_any_of("="));
          if(vCookie[0].compare("sessionid") == 0) {
            if(vCookie.size() > 1) {
              client_sessionid_ = vCookie[1];
              break;
            }
          }
        }             }
    }         }
    //Search Content-Length
    loc=client_request_.find("Content-Length");
    if(loc == std::string::npos) {
      //No Content-Length, no Content? -> stop further reading
      send_bad_request();
      return;
    }
    else {
      //Parse Content-Length, for further body reading
      std::string::size_type locend=client_request_.find_first_of("\r\n", loc);
      if(locend == std::string::npos) {
        //Couldn't find header end, can't parse Content-Length -> stop further reading
        send_bad_request();
        return;
      }
      std::string lHeader = client_request_.substr(loc, (locend-loc));
      loc = lHeader.find(": ");
      if(loc == std::string::npos) {
        //Couldn't find colon, can't parse Content-Length -> stop further reading
        send_bad_request();
        return;
      }
      //Save Content-Length
      client_request_content_length_ = boost::lexical_cast<std::string::size_type>(lHeader.substr(loc+2));
      //Check if already read complete body
      if((client_request_.size()-(client_headerEnd_)) < client_request_content_length_) {
        //Content-Length greater than current body, start reading.
        client_socket_.async_read_some(boost::asio::buffer(client_buffer_),
            boost::bind(&session::handle_client_read_body, shared_from_this(),
            boost::asio::placeholders::error,
            boost::asio::placeholders::bytes_transferred));
      }
      else {
        //Body is complete, start handling
        handle_request();
      }
    }
  }
}

Edit 2:

Client used for testing is a simple C#-Application which starts 128-Threads each iterate 1000 times without any Sleep.

System.Net.HttpWebRequest req = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(BaseUrl);
req.Method = "POST";
byte[] buffer = Encoding.ASCII.GetBytes("{\"method\":\"User.Login\",\"params\":[]}");
req.GetRequestStream().Write(buffer, 0, buffer.Length);
req.GetRequestStream().Close();
RaphaelH
  • 2,144
  • 2
  • 30
  • 43
  • Rewrite code simply. It's not optimal. request_parser in example parse per byte and push_back it to string, without any reserve. There is also some undocumented issue with asio::strand. – ForEveR Aug 29 '13 at 09:21
  • Difficult to tell without seeing some real code.. have you tried profiling it? Also, have you compiled it with any optimisations? – Nim Aug 29 '13 at 09:22
  • This is often a problem with leaving Nagle turned on. See: http://stackoverflow.com/questions/2039224/poor-boost-asio-performance/2039378#2039378 – janm Aug 29 '13 at 09:30
  • @ForEveR even if request_parse isn't optimal, it doesn't explain why is it so extreme slow. – RaphaelH Aug 29 '13 at 09:37
  • @Nim Have a look at the Example HTTP Server 3 – RaphaelH Aug 29 '13 at 09:38
  • @janm Also tried that with acceptor_.set_option(boost::asio::ip::tcp::no_delay(true)); – RaphaelH Aug 29 '13 at 09:39
  • I don't understand - 'GetQueuedCompletionStatus' is a blocking 'pop' call on the IOCP queue - why is it doing so much work? – Martin James Aug 29 '13 at 10:16
  • Boost Asio performs bad with many threads. consider please: http://stackoverflow.com/questions/1234750/c-socket-server-unable-to-saturate-cpu – inkooboo Aug 29 '13 at 11:47
  • Please add a description of the HTTP client used for testing. – free_coffee Aug 29 '13 at 13:09
  • @free_coffe see Edit 2 – RaphaelH Aug 29 '13 at 13:27
  • @RaphaelH ok, but what C# APIs does it use to send the HTTP requests? Is it small enough to paste the code used for each iteration? – free_coffee Aug 29 '13 at 13:36
  • @free_coffee I dont think that matters cause in my Test's against a C# Http Server he handled enough.. But in the Edit there are the lines ;) – RaphaelH Aug 29 '13 at 13:41
  • 1
    @RaphaelH it might matter if both the C# client and server use HTTP keepalive. AFAIK the Boost.Asio HTTP example closes the socket after every request. – free_coffee Aug 29 '13 at 13:58
  • While the abstraction provided by Boost.Asio will contribute to some delays, the HTTP protocol is processed within user mode. On the other hand, `HttpListener` is tailored to a specific OS and the protocol is processed in an HTTP stack within kernel mode. It may be interesting to compare the results of the Boost.Asio application with one that uses a `TcpListener` and processes the HTTP protocol in user mode. – Tanner Sansbury Aug 30 '13 at 00:17
  • @TannerSansbury Just tried using TcpListener and Parsing Request like in C++, but still I reach 4500 requests per second in C#. For me it seems like the C++ Boost.Asio does everything single-threaded.. – RaphaelH Aug 30 '13 at 07:09

3 Answers3

6

The reason for the slowness probably is that Boost::Asio HTTP Server 3 example always closes the connection after each response, forcing the client to create a new connection for each request. Opening and closing connection on every request takes lots of time. Obviously, this could not outperform any server that supports HTTP/1.1 and Keep-alive (basically, doesn't close client connection and allows client to reuse it for subsequent requests).

Your C# server, System.Net.HttpListener, does support Keep-alive. The client, System.Net.HttpWebRequest, also has Keep-alive enabled by default. So, the connections are reused in this configuration.

Adding keep-alive to HTTP Server 3 example is straightforward:

  1. inside connection::handle_read() check the request if client requested Keep-alive and store this flag within the connection

  2. change connection::handle_write() so that it initiates graceful connection closure only when client doesn't support Keep-alive, otherwise just initiate async_read_some() like you already do in connection::start():

    socket_.async_read_some(boost::asio::buffer(buffer_),
        strand_.wrap(
            boost::bind(&connection::handle_read, shared_from_this(),
                boost::asio::placeholders::error,
                boost::asio::placeholders::bytes_transferred)));
    

And don't forget to clear your request/reply and reset the request_parser before calling async_read_some().

vond
  • 1,908
  • 17
  • 17
  • As I wrote already in the main comments, I tried using .net TcpListener and parsing the Request like I did in C++, but still I reach easily 4500 requests per second. And there I also closed every single connection. – RaphaelH Dec 03 '13 at 07:19
  • 1
    I still don't get what is the point in testing performance of an HTTP server without keepalive. None of the web servers work that way because it's too slow. – vond Dec 03 '13 at 08:47
  • I understand your objection, but it is about the confusion that .Net is that faster. – RaphaelH Dec 03 '13 at 08:52
  • If you have any interest in comparing apples to apples, either use keepalive both in C# and in Boost.Asio, or disable it in both of them. – vond Dec 03 '13 at 09:00
  • As I said, I tried comparing it using a TcpListener instead of HttpListener, which I think does not support HTTP Keep Alive. – RaphaelH Dec 03 '13 at 09:20
  • *bump* @RaphaelH - were you able to try out vond's suggestion? – quixver Mar 31 '14 at 09:33
  • @quixver Not yet. But if you're interested in it, give it a try on your own! Maybe you'll figure something out – RaphaelH Apr 10 '14 at 20:27
2

it seems that client_request_.find("\r\n\r\n"); is called repeatedly -- hunting for the end tokens from the beginning of the string each loop. use a starting position position. such as client_request_.find("\r\n\r\n", lastposition); (using bytes_transferred)

its possible to use asycn_read_until( ,"\r\n\r\n"); found here

or async_read which should read all (instead of some).

Gabe Rainbow
  • 3,658
  • 4
  • 32
  • 42
0

About HTTP server 3 example. Look at the request_parser source code. The methods parse/consume. It is really not optimial cus it getting data from buffer byte-by-byte and working with each byte; pushing into std::string using push_back and so on. Its just an example.

Also, if you are using asio::strand notice that it uses a mutex t lock "strand implementation". For HTTP server its easily possible to remove asio::strand at all, so i recomment to do this. If you want to stay with strands - to avoid delays on locking you can set those defines at compile time:

-DBOOST_ASIO_STRAND_IMPLEMENTATIONS=30000 -DBOOST_ASIO_ENABLE_SEQUENTIAL_STRAND_ALLOCATION
Galimov Albert
  • 7,269
  • 1
  • 24
  • 50
  • Just removed all "strand"'s, but still extremely slow. I'm not using the tutorial request_parser, added the code that I'm using. – RaphaelH Aug 29 '13 at 11:26