boost::async_write fails after writing for some time

Question

I am having a very peculiar problem. I have written a server that writes data that it receives from a third party to connected clients. The server writes to the client(s) fine for a while, but after a while, async_write either fails or a write never returns. For my program, if an async_write never returns, then no subsequent writes will take place, and my server will queue up the data it receives from the third party until everything blows up.

I have included my code below:

void ClientPartitionServer::HandleSignal(const CommonSessionMessage& message, int transferSize) {
  boost::lock_guard<boost::mutex> lock(m_mutex);
  if(m_clientSockets.size() != 0) {
    TransferToQueueBuffer(message.GetData(), transferSize);
  }
  if(m_writeCompleteFlag) {
    // TransferToWriteBuffer();
    for(vector<boost::asio::ip::tcp::socket*>::const_iterator i = m_clientSockets.begin(); i != m_clientSockets.end(); ++i) {
      WriteToClient(*i);
    }
  }
}

void ClientPartitionServer::WriteToClient(boost::asio::ip::tcp::socket* clientSocket) {
  m_writeCompleteFlag = false;
  cout << "Iniating write: " << m_identifier << endl;
  boost::asio::async_write(
    *clientSocket,
    boost::asio::buffer(m_queueBuffer.get(), m_queueBufferSize),
    boost::bind(
      &ClientPartitionServer::HandleWrite, this,
      boost::asio::placeholders::error,
      boost::asio::placeholders::bytes_transferred
  ));
}

void ClientPartitionServer::HandleWrite(const boost::system::error_code& ec, size_t bytes_transferred) {
  boost::lock_guard<boost::mutex> lock(m_mutex);
  if(ec != 0) {
    cerr << "Error writing to client: " << ec.message() << " " << m_identifier << endl;
    // return;
    cout << "HandleWrite Error" << endl;
    exit(0);
  }
  cout << "Write complete: " << m_identifier << endl;
  m_writeCompleteFlag = true;
  m_queueBuffer.reset();
  m_queueBufferSize = 0;
}

Any help would be appreciated.

Thank you.

Use ASIO strands instead. See http://www.boost.org/doc/libs/1_46_1/doc/html/boost_asio/tutorial/tuttimer5.html for an example and details. — Sean, May 27 '11 at 23:15
If you are sure that async_write never returns, then it is a bug in asio, because async_write is documented to return immediately. — user239558, Mar 14 '13 at 09:07

score 4 · Answer 1 · answered May 26 '11 at 20:23

4

Without seeing all the code it's hard to say, but it's a red flag to me that you hold the mutex across multiple (or even one) WriteToClient calls. Typically holding a lock of any kind across I/O (even async as you have here) is at best bad for performance and at worst a recipe for weird deadlocks under load. What happens if the async write completes inline and you get called back on HandleWrite in the same thread/callstack, for instance?

I would try to refactor this so that the lock is released during the write calls.

Whatever the solution turns out to be, more general advice:

don't lock across I/Os
add some diagnostic output - what thread calls each handler, and in what order?
try debugging once you hit the quiescent state. Should be able to diagnose a deadlock from the
process state.

answered May 26 '11 at 20:23

Steve Townsend

53,498
9
91
140

However, my program may have concurrent calls to the async_write code block as well as write input to the queue that holds the data to write out. If I don't protect these code blocks with mutexes, then how am I supposed to protect against data corruption from concurrent input/output being performed on the same data? – czchlong May 27 '11 at 19:01
+1 for don't lock across I/O operations. @czchlong read about strands, see [Sean's answer](http://stackoverflow.com/questions/6143739/boostasync-write-fails-after-writing-for-some-time/6158561#615856). – Sam Miller May 28 '11 at 02:29
I don't see that holding locks across IO is a problem. What is usually a problem is holding locks across blocking operations, and holding locks across callbacks. So the question is whether asio does any sort of callbacks inside async_write, or if async_write can block. From reading the documentation, async_write gives both guarantees. It must immediately return, and the handler is guaranteed not to be called from within async_write. – user239558 Mar 14 '13 at 09:00

score 1 · Answer 2 · answered May 27 '11 at 23:19

1

Use strands to serialize access to particular connection objects. In particular, check out strand::wrap(). To see other examples of using strands, check out a a few different timer examples (though the code works for any async_*() call).

answered May 27 '11 at 23:19

Sean

9,888
4
40
43

score 0 · Answer 3 · answered Mar 14 '13 at 09:20

First of all, I don't agree with the comments indicating that holding locks across an async operation is a problem.

Holding locks across:

Any function that invokes callbacks is bad.
Any blocking operation is bad.

async_write explicitly guarantees to neither block, nor call the handler, so it looks good to me to hold the lock.

However, I can see a bug in your code that violates another requirement that async_write has. You are not allowed to call async_write until the completion handler has been invoked. That's what you violate.

The m_writeCompleteFlag is set to true whenever one of the handlers have been invoked. This means that you are likely to violate the async_write rules for some of the other N-1 sockets under high load.

boost::async_write fails after writing for some time

3 Answers3

Linked