I have a server application that uses boost ASIO to communicate with several clients. The server application runs on a Linux server and the clients run on Windows desktops.
The current design is multi-threaded although there is only one boost ASIO thead (which runs boost::asio::io_context
). The boost ASIO thread is only responsible for reading, writing, and some infrequent dispatch. Reading is done using boost::asio::async_read
but copies the resulting message so that another thread can do the work of processing. Writing is done using boost::asio::write
but the message has already been copied and handed off to the boost ASIO thread
Under most circumstances when a client disconnects boost ASIO throws an error, I shut down the associated socket, and the other sockets keep working. However if a client's Windows desktop has a power failure while boost::asio::write
is writing to them then boost does not detect an issue and hangs in boost::asio::write
. It hangs for almost 20 minutes sometimes and the server cannot communicate with other clients during this time
From what I have read online the authors of boost ASIO have no intention of introducing a timeout parameter. I tried setting SO_SNDTIMEO to 5 seconds but that didn't have any affect on the write hang. As of now my best guess to solve the issue is to give every socket a different thread so that one client cannot take down the other clients. Are there any better options than this? If I do give every socket its own thread does that mean I will need a boost::asio::io_context
per thread to avoid the write hang?
Edit: After seeing the comments I tried redoing the function that calls boost::asio::write
with boost::asio::async_write
. Below I have some code that was simplified for SO but still shows what the overall change was:
Originally with boost::asio::write
:
inline void MessagingServer::writeMessage(
GuiSession* const a_guiSession,
const PB::Message& a_msg
) {
boost::asio::dispatch(m_guiIoIoContext, [this, a_guiSession, a_msg]() {
// I removed code that writes a_msg's bytes into m_guiIoWriteBuf
// and sets totalSize to simplify for SO
boost::system::error_code error;
boost::asio::write(a_guiSession->m_guiIoGsSocket, boost::asio::buffer(m_guiIoWriteBuf, totalSize), error);
if (UNLIKELY(error))
ERRLOG << a_guiSession->m_gsSessionId << " write failed: " << error.message();
});
}
Redone with boost::asio::async_write
:
inline void MessagingServer::writeMessage(
GuiSession* const a_guiSession,
const PB::Message& a_msg
) {
a_guiSession->m_tempMutex.lock();
boost::asio::dispatch(m_guiIoIoContext, [this, a_guiSession, a_msg]() {
// I removed code that writes a_msg's bytes into m_guiIoWriteBuf
// and sets totalSize to simplify for SO
boost::asio::async_write(
a_guiSession->m_guiIoGsSocket,
boost::asio::buffer(m_guiIoWriteBuf, totalSize),
[this, a_guiSession](const boost::system::error_code& a_error, std::size_t) {
if (UNLIKELY(a_error))
ERRLOG << a_guiSession->m_gsSessionId << " write failed: " << a_error.message();
a_guiSession->m_tempMutex.unlock();
}
);
});
}
The lock was introduced in the second code to guarantee only one call to boost::asio::async_write
was active at a time (I am aware that there are more performant ways to do this but this is simpler for testing). Both of these codes have the same issue of hanging boost ASIO when the client has a power failure. However they do hang in different ways, the asynchronous code does allow for boost ASIO to perform other actions, just not further writes until the hanging one produces an error
During a separate experiment I did try setting SO_KEEPALIVE
but that also did not solve the hang issue