-1

I have a server application that uses boost ASIO to communicate with several clients. The server application runs on a Linux server and the clients run on Windows desktops.

The current design is multi-threaded although there is only one boost ASIO thead (which runs boost::asio::io_context). The boost ASIO thread is only responsible for reading, writing, and some infrequent dispatch. Reading is done using boost::asio::async_read but copies the resulting message so that another thread can do the work of processing. Writing is done using boost::asio::write but the message has already been copied and handed off to the boost ASIO thread

Under most circumstances when a client disconnects boost ASIO throws an error, I shut down the associated socket, and the other sockets keep working. However if a client's Windows desktop has a power failure while boost::asio::write is writing to them then boost does not detect an issue and hangs in boost::asio::write. It hangs for almost 20 minutes sometimes and the server cannot communicate with other clients during this time

From what I have read online the authors of boost ASIO have no intention of introducing a timeout parameter. I tried setting SO_SNDTIMEO to 5 seconds but that didn't have any affect on the write hang. As of now my best guess to solve the issue is to give every socket a different thread so that one client cannot take down the other clients. Are there any better options than this? If I do give every socket its own thread does that mean I will need a boost::asio::io_context per thread to avoid the write hang?

Edit: After seeing the comments I tried redoing the function that calls boost::asio::write with boost::asio::async_write. Below I have some code that was simplified for SO but still shows what the overall change was:

Originally with boost::asio::write:

inline void MessagingServer::writeMessage(
    GuiSession* const  a_guiSession,
    const PB::Message& a_msg
) {
    boost::asio::dispatch(m_guiIoIoContext, [this, a_guiSession, a_msg]() {
        // I removed code that writes a_msg's bytes into m_guiIoWriteBuf
        // and sets totalSize to simplify for SO

        boost::system::error_code error;
        boost::asio::write(a_guiSession->m_guiIoGsSocket, boost::asio::buffer(m_guiIoWriteBuf, totalSize), error);
        if (UNLIKELY(error))
            ERRLOG << a_guiSession->m_gsSessionId << " write failed: " << error.message();
    });
}

Redone with boost::asio::async_write:

inline void MessagingServer::writeMessage(
    GuiSession* const  a_guiSession,
    const PB::Message& a_msg
) {
    a_guiSession->m_tempMutex.lock();

    boost::asio::dispatch(m_guiIoIoContext, [this, a_guiSession, a_msg]() {
        // I removed code that writes a_msg's bytes into m_guiIoWriteBuf
        // and sets totalSize to simplify for SO

        boost::asio::async_write(
            a_guiSession->m_guiIoGsSocket,
            boost::asio::buffer(m_guiIoWriteBuf, totalSize),
            [this, a_guiSession](const boost::system::error_code& a_error, std::size_t) {
                if (UNLIKELY(a_error))
                    ERRLOG << a_guiSession->m_gsSessionId << " write failed: " << a_error.message();

                a_guiSession->m_tempMutex.unlock();
            }
        );
    });
}

The lock was introduced in the second code to guarantee only one call to boost::asio::async_write was active at a time (I am aware that there are more performant ways to do this but this is simpler for testing). Both of these codes have the same issue of hanging boost ASIO when the client has a power failure. However they do hang in different ways, the asynchronous code does allow for boost ASIO to perform other actions, just not further writes until the hanging one produces an error

During a separate experiment I did try setting SO_KEEPALIVE but that also did not solve the hang issue

asimes
  • 5,749
  • 5
  • 39
  • 76
  • Considering your server communicates with multiple clients, you should probably be using `async_write()` instead of `write()` in the first place, and it would make this effectively a non-issue. –  Nov 22 '21 at 19:57
  • 1
    This is not a problem with asio, but with the underling OS API, or more likely on how TCP works under the hood. You probably want to setup [SO_KEEPALIVE](https://stackoverflow.com/questions/5435098/how-to-use-so-keepalive-option-properly-to-detect-that-the-client-at-the-other-e). See also https://www.boost.org/doc/libs/1_77_0/doc/html/boost_asio/reference/socket_base/keep_alive.html – sbabbi Nov 22 '21 at 20:10
  • @Frank, Please see my edit to the question, I tried out replacing `write` with `async_write`. It seems that change alone is not enough to solve the problem – asimes Nov 23 '21 at 17:08

1 Answers1

1

I concur with the commenters that this is how TCP generally works.

Note that you can introduce timeouts using an ASIO timer which allow you to cancel aynchronous operations on your sockets.

There are many many examples if you search for

  • boost::asio::steady_timer, oost::asio::high_resolution_timer and analogous members of the std::chrono family of clocks
  • boost::deadline_timer
sehe
  • 374,641
  • 47
  • 450
  • 633