How can we sequentially receive multiple data from boost::asio::tcp::ip::read_some calls?

Question

Let us suppose that a client holds two different big objects (in terms of byte size) and serializes those followed by sending the serialized objects to a server over TCP/IP network connection using boost::asio.

For client side implementation, I'm using boost::asio::write to send binary data (const char*) to the server.
For server side implementation, I'm using read_some rather than boost::asio::ip::tcp::iostream for future improvement for efficiency. I built the following recv function at the server side. The second parameter std::stringstream &is holds a big received data (>65536 bytes) in the end of the function.

When the client side calls two sequential boost::asio::write in order to send two different binary objects separately, the server side sequentially calls two corresponding recv as well. However, the first recv function absorbs all of two incoming big data while the second call receives nothing ;-(. I am not sure why this happens and how to solve it.

Since each of two different objects has its own (De)Serialization function, I'd like to send each data separately. In fact, since there are more than 20 objects (not just 2) that have to be sent over the network.

void recv (
    boost::asio::ip::tcp::socket &socket,
    std::stringstream &is) {

    boost::array<char, 65536> buf;

    for (;;) {
        boost::system::error_code error;
        size_t len = socket.read_some(boost::asio::buffer(buf), error);
        std::cout << " read "<< len << " bytes" << std::endl;  // called multiple times for debugging!

        if (error == boost::asio::error::eof)
          break;
        else if (error)
          throw boost::system::system_error(error); // Some other error.

        std::stringstream buf_ss;
        buf_ss.write(buf.data(), len);
        is << buf_ss.str();
    }
}

Client main file:

int main () {
    ... // some 2 different big objects are constructed.
    std::stringstream ss1, ss2;
    ... // serializing bigObj1 -> ss1 and bigObj2-> ss2, where each object is serialized into a string. This is due to the dependency of our using some external library
    const char * big_obj_bin1 = reinterpret_cast<const char*>(ss1.str().c_str());
    const char * big_obj_bin2 = reinterpret_cast<const char*>(ss2.str().c_str());

    boost::system::error_code ignored_error;
    boost::asio::write(socket, boost::asio::buffer(big_obj_bin1, ss1.str().size()), ignored_error);
    boost::asio::write(socket, boost::asio::buffer(big_obj_bin2, ss2.str().size()), ignored_error);

    ... // do something
    return 0;
}

Server main file:

int main () {
    ... // socket is generated. (communication established)
    std::stringstream ss1, ss2;
    recv(socket,ss1); // this guy absorbs all of incoming data
    recv(socket,ss2); // this guy receives 0 bytes ;-(
    ... // deserialization to two bib objects
    return 0;
}

Before you implement a protocol on top of TCP, it is *very* important to document that protocol. Otherwise, there's no way to know if your code makes sense or doesn't. Is the termination condition of your `for` loop in `recv` correct? There's no way to know without knowing how the protocol is supposed to work. — David Schwartz, Feb 26 '18 at 23:21

sehe · Accepted Answer · 2018-02-26T23:17:50.117

recv(socket,ss1); // this guy absorbs all of incoming data

Of course it absorbs everything. You explicitly coded recv to do an infinite loop until eof. That's the end of the stream, which means "whenever the socket is closed on the remote end".

So the essential thing missing from the protocol is framing. The most common way to address it are:

sending data length before data, this way the server knows how much to read
sending a "special sequence" to delimit frames. In text, a common special delimiter would be '\0'. However, for binary data it is (very) hard to arrive at a delimiter that cannot naturally occur in the payload.

Of course, if you know extra characteristics of your payload you can use that. E.g. if your payload is compressed, you know you won't regularly find a block of 512 identical bytes (they would have been compressed). Alternatively you resort to encoding the binary data in ways that removes the ambiguity. yEnc, Base122 et al. come to mind (see Binary Data in JSON String. Something better than Base64 for inspiration).

Notes:

Regardless of that

it's clumsy to handwrite the reading loop. Next it is very unnecessary to do that and also copy the blocks into a stringstream anyways. If you're doing all that copying anyways, just use boost::asio::[async_]read with boost::asio::streambuf directly.
This is clear UB:
```
const char * big_obj_bin1 = reinterpret_cast<const char*>(ss1.str().c_str());
const char * big_obj_bin2 = reinterpret_cast<const char*>(ss2.str().c_str());
```
str() returns a temporary copy of the buffer - which not only is wasteful, but means that the const char* are dangling the moment they have been initialized.

Thank you. By the way, do you have some examples of communication to use different types of asio buffer. In fact, I referred to boost::asio original examples, but from that I couldn't understand enough. Thank you. — user9414424, Feb 28 '18 at 22:33

How can we sequentially receive multiple data from boost::asio::tcp::ip::read_some calls?

1 Answers1

Notes: