7

I would like to use the very convenient Boost async_read_until to read a message until I get the \r\n\r\n delimiter.

I like using this delimiter because it's easy to debug with telnet and make multiline commands. I just signal end of command by two new lines.

I call async_read_until like this:

void do_read()
{
    boost::asio::async_read_until(m_socket,
                                  m_input_buffer,
                                  "\r\n\r\n",
                                  std::bind(&player::handle_read, this, std::placeholders::_1, std::placeholders::_2));
}

And my handler looks like this at the moment:

void handle_read(boost::system::error_code ec, std::size_t nr)
{
    std::cout << "handle_read: ec=" << ec << ", nr=" << nr << std::endl;

    if (ec) {
        std::cout << "  -> emit on_disconnect\n";
    } else {
        std::istream iss(&m_input_buffer);
        std::string msg;
        std::getline(iss, msg);

        std::cout << "dump:\n";
        std::copy(msg.begin(), msg.end(), std::ostream_iterator<int>(std::cout, ", "));
        std::cout << std::endl;

        do_read();
    }
}

I wanted to use std::getline just like the example, but on my system this keeps the \r character. As you can see, if I connect to the server and write hello plus two CRLF, I get this dump server side:

handle_read: ec=system:0, nr=9
dump:
104, 101, 108, 108, 111, 13, 
                         ^^^ \r here

By the way, this will also keep the next new line in the buffer. So I think that std::getline will not do the job for me.

I search a convenient and efficient way to read from the boost::asio::streambuf until I get this \r\n\r\n delimiter. Since I use async_read_until once at a time, when the handler is called, the buffer is supposed to have the exact and full data isn't it? What do you recommend to read until I get \r\n\r\n?

Arunmu
  • 6,837
  • 1
  • 24
  • 46
markand
  • 495
  • 1
  • 4
  • 16

3 Answers3

17

The async_read_until() operation commits all data read into the streambuf's input sequence, and the bytes_transferred value will contain the number of bytes up to and including the first delimiter. While the operation may read more data beyond the delimiter, one can use the bytes_transferred and delimiter size to extract only the desired data. For example, if cmd1\r\n\r\ncmd2 is available to be read from a socket, and an async_read_until() operation is initiated with a delimiter of \r\n\r\n, then the streambuf's input sequence could contain cmd1\r\n\r\ncmd2:

    ,--------------- buffer_begin(streambuf.data())
   /   ,------------ buffer_begin(streambuf.data()) + bytes_transferred
  /   /                - delimiter.size()
 /   /       ,------ buffer_begin(streambuf.data()) + bytes_transferred
/   /       /   ,--  buffer_end(streambud.data())
cmd1\r\n\r\ncmd2

As such, one could extract cmd1 into a string from the streambuf via:

// Extract up to the first delimiter.
std::string command{
  boost::asio::buffers_begin(streambuf.data(), 
  boost::asio::buffers_begin(streambuf.data()) + bytes_transferred
    - delimiter.size()};
// Consume through the first delimiter.
m_input_buffer.consume(bytes_transferred);

Here is a complete example demonstrating constructing std::string directly from the streambuf's input sequence:

#include <functional> // std::bind
#include <iostream>
#include <boost/asio.hpp>

const auto noop = std::bind([]{});

int main()
{
  using boost::asio::ip::tcp;
  boost::asio::io_service io_service;

  // Create all I/O objects.
  tcp::acceptor acceptor(io_service, tcp::endpoint(tcp::v4(), 0));
  tcp::socket socket1(io_service);
  tcp::socket socket2(io_service);

  // Connect sockets.
  acceptor.async_accept(socket1, noop);
  socket2.async_connect(acceptor.local_endpoint(), noop);
  io_service.run();
  io_service.reset();

  const std::string delimiter = "\r\n\r\n";

  // Write two commands from socket1 to socket2.
  boost::asio::write(socket1, boost::asio::buffer("cmd1" + delimiter));
  boost::asio::write(socket1, boost::asio::buffer("cmd2" + delimiter));

  // Read a single command from socket2.
  boost::asio::streambuf streambuf;
  boost::asio::async_read_until(socket2, streambuf, delimiter,
    [delimiter, &streambuf](
      const boost::system::error_code& error_code,
      std::size_t bytes_transferred)
    {
      // Verify streambuf contains more data beyond the delimiter. (e.g.
      // async_read_until read beyond the delimiter)
      assert(streambuf.size() > bytes_transferred);

      // Extract up to the first delimiter.
      std::string command{
        buffers_begin(streambuf.data()),
        buffers_begin(streambuf.data()) + bytes_transferred
          - delimiter.size()};

      // Consume through the first delimiter so that subsequent async_read_until
      // will not reiterate over the same data.
      streambuf.consume(bytes_transferred);

      assert(command == "cmd1");
      std::cout << "received command: " << command << "\n"
                << "streambuf contains " << streambuf.size() << " bytes."
                << std::endl;
    }
  );
  io_service.run();
}

Output:

received command: cmd1
streambuf contains 8 bytes.
Tanner Sansbury
  • 51,153
  • 9
  • 112
  • 169
  • And what if the actual buffer was read to `cmd1\r\n\r\ncmd2\r\n\r\n`? The second command will be parsed only when a third command ore more is sent. So we need to loop while there is some data available isn't it? – markand Nov 21 '16 at 14:54
  • @markand No. The most elegant solution is to issue another `async_read_until` operation. As the buffer already contains the delimiter, the completion condition will be detected before attempting I/O, and the completion handler will be ready to run. – Tanner Sansbury Nov 21 '16 at 16:19
  • Probably something like [this](http://coliru.stacked-crooked.com/a/51b5be0caf331187)? – markand Nov 22 '16 at 08:19
  • @markand Yes, chaining the `async_read_until` in that manner is the idiomatic way to handle it. – Tanner Sansbury Nov 22 '16 at 14:48
0

To answer your questions first:

the buffer is supposed to have the exact and full data isn't it?

Yes, it will have all the data including "\r\n\r\n"

What do you recommend to read until I get \r\n\r\n?

What you are doing is fine enough. You just need to ignore the additional '\r' at the end of each command. This you can either do while reading from the stream or let it be handled by the command processor (or anything which does the command processing for you). My recommendation would be to defer the removal of additional '\r' to the command processor.

You probably need something on the lines of :

#include <iostream>
#include <string>
#include <sstream>

void handle_read()
{
  std::stringstream oss;
  oss << "key : value\r\nkey2: value2\r\nkey3: value3\r\n\r\n";
  std::string parsed;

  while (std::getline(oss, parsed)) {
    // Check if it'a an empty line.
    if (parsed == "\r") break;
    // Remove the additional '\r' here or at command processor code.
    if (parsed[parsed.length() - 1] == '\r') parsed.pop_back();
    std::cout << parsed << std::endl;
    std::cout << parsed.length() << std::endl;
  }

}

int main() {
    handle_read();
    return 0;
}

If your protocol allows you to send empty commands, then you will have to change the logic and have a lookout for 2 consecutive empty new lines.

Arunmu
  • 6,837
  • 1
  • 24
  • 46
  • Thank you, I've implemented your idea with a temporary buffer and cleaning up the string while iterating. http://ideone.com/KWj32L. I will add some tests about empty strings as well. – markand Nov 12 '16 at 13:08
0

What do you actually wish to parse?

Of course, you could just use knowledge from your domain and say

std::getline(iss, msg, '\r');

At a higher level, consider parsing what you need:

std::istringstream linestream(msg);
std::string command;
int arg;
if (linestream >> command >> arg) {
    // ...
}

Even better, consider a parser generator:

std::string command;
int arg;

if (qi::phrase_parse(msg.begin(), msg.end(), command_ >> qi::int_, qi::space, command, arg))
{
    // ...
}

Where command_ could be like

qi::rule<std::string::const_iterator> command_ = qi::no_case [ 
     qi::lit("my_cmd1") | qi::lit("my_cmd2") 
  ];
sehe
  • 374,641
  • 47
  • 450
  • 633