3

I have some multi-threaded code that uses Boost's SSL ASIO library. The code is multi-threaded, but there's a mutex associated with each SSL connection and calls to async_* functions are done holding the mutex.

I sometimes see my code grind to a halt. There's a stuck thread. The thread is stuck in bio_write. The stack trace looks this:

#0  0x00000000010a974f in bio_write ()   

#1  0x00000000010a3529 in BIO_write ()

#2  0x000000000105fe72 in ssl3_write_pending ()

#3  0x0000000001060b02 in ssl3_write_bytes ()

#4  0x0000000000cce43a in boost::asio::ssl::detail::engine::do_write
(this=0x1e618a0, data=0x234c7b0, length=189) at
/usr/include/boost/asio/ssl/detail/impl/engine.ipp:294

#5  0x0000000000cce109 in boost::asio::ssl::detail::engine::perform(this=0x1e618a0, op=(int
(boost::asio::ssl::detail::engine::*)(boost::asio::ssl::detail::engine *
const, void *, std::size_t)) 0xcce3fc<boost::asio::ssl::detail::engine::do_write(void*, unsigned long)>,
data=0x234c7b0, length=189, ec=..., bytes_transferred=0x7fff8ad74d48) at
/usr/include/boost/asio/ssl/detail/impl/engine.ipp:219

#6  0x0000000000ccdd23 in boost::asio::ssl::detail::engine::write
(this=0x1e618a0, data=..., ec=..., bytes_transferred=@0x7fff8ad74d48: 0) at
/usr/include/boost/asio/ssl/detail/impl/engine.ipp:137

#7  0x0000000000dac504 in boost::asio::ssl::detail::write_op<boost::asio::mutable_buffers_1>::operator()
(this=0x7fff8ad74d20, eng=..., ec=..., bytes_transferred=@0x7fff8ad74d48: 0)
at /usr/include/boost/asio/ssl/detail/write_op.hpp:51

#8  0x0000000000da9d8a in boost::asio::ssl::detail::io_op<boost::asio::basic_stream_socket<boost::asio::ip::tcp,
boost::asio::stream_socket_service<boost::asio::ip::tcp> >,
boost::asio::ssl::detail::write_op<boost::asio::mutable_buffers_1>,
boost::asio::detail::write_op<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp,
boost::asio::stream_socket_service<boost::asio::ip::tcp> > >,
boost::asio::mutable_buffers_1, boost::asio::detail::transfer_all_t,
boost::_bi::bind_t<void, boost::_mfi::mf2<void, Peer,
boost::system::error_code const&, unsigned long>,
boost::_bi::list3<boost::_bi::value<boost::shared_ptr<Peer> >, boost::arg<1>
(*)(), boost::arg<2> (*)()> > > >::operator() (this=0x7fff8ad74d10, ec=..., 
bytes_transferred=0, start=1) at /usr/include/boost/asio/ssl/detail/io.hpp:136

#9  0x0000000000da7f42 in boost::asio::ssl::detail::async_io<boost::asio::basic_stream_socket<boost::asio::ip::tcp,
boost::asio::stream_socket_service<boost::asio::ip::tcp> >,
boost::asio::ssl::detail::write_op<boost::asio::mutable_buffers_1>,
boost::asio::detail::write_op<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp,
boost::asio::stream_socket_service<boost::asio::ip::tcp> > >,
boost::asio::mutable_buffers_1, boost::asio::detail::transfer_all_t,
boost::_bi::bind_t<void, boost::_mfi::mf2<void, Peer,
boost::system::error_code const&, unsigned long>,
boost::_bi::list3<boost::_bi::value<boost::shared_ptr<Peer> >, boost::arg<1> (*)(),
boost::arg<2> (*)()> > > > (next_layer=..., core=..., op=...,
handler=...) at /usr/include/boost/asio/ssl/detail/io.hpp:322

#10 0x0000000000da634d in
boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp,
boost::asio::stream_socket_service<boost::asio::ip::tcp> >
>::async_write_some<boost::asio::mutable_buffers_1,
boost::asio::detail::write_op<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp,
boost::asio::stream_socket_service<boost::asio::ip::tcp> > >,
boost::asio::mutable_buffers_1, boost::asio::detail::transfer_all_t,
boost::_bi::bind_t<void, boost::_mfi::mf2<void, Peer,
boost::system::error_code const&, unsigned long>,
boost::_bi::list3<boost::_bi::value<boost::shared_ptr<Peer> >, boost::arg<1>
(*)(), boost::arg<2> (*)()> > > > (this=0x1e61880, buffers=..., handler=...)
at /usr/include/boost/asio/ssl/stream.hpp:502

#11 0x0000000000da3032 in
boost::asio::detail::write_op<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp,
boost::asio::stream_socket_service<boost::asio::ip::tcp> > >,
boost::asio::mutable_buffers_1, boost::asio::detail::transfer_all_t,
boost::_bi::bind_t<void, boost::_mfi::mf2<void, Peer,
boost::system::error_code const&, unsigned long>,
boost::_bi::list3<boost::_bi::value<boost::shared_ptr<Peer> >, boost::arg<1>
(*)(), boost::arg<2> (*)()> > >::operator() (this=0x7fff8ad74f00, ec=...,   
bytes_transferred=0, start=1) at /usr/include/boost/asio/impl/write.hpp:250 

#12 0x0000000000d9ffbb in
boost::asio::async_write<boost::asio::ssl::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp,
boost::asio::stream_socket_service<boost::asio::ip::tcp> > >,
boost::asio::mutable_buffers_1, boost::_bi::bind_t<void,
boost::_mfi::mf2<void, Peer, boost::system::error_code const&, unsigned
long>, boost::_bi::list3<boost::_bi::value<boost::shared_ptr<Peer> >,  
boost::arg<1> (*)(), boost::arg<2> (*)()> > > (s=..., buffers=...,     
handler=...) at /usr/include/boost/asio/impl/write.hpp:585

Any ideas?

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • The boost version might be relevant, can you add that to your question? I say this because the SSL implementation was rewritten in boost 1.47. – Sam Miller Mar 12 '13 at 03:02
  • Is it safe to assume this is the only [outstanding write](http://www.boost.org/doc/libs/release/doc/html/boost_asio/reference/async_write/overload1.html) operation for the stream? Doing otherwise may result in undefined behavior. As an example, it isn't sufficient to `lock mutex; async_write(...); unlock mutex;` as the operation is still in flight until the completion handler is invoked. – Sam Miller Mar 12 '13 at 03:04
  • Boost 1.50. I double checked that our one outstanding write logic is correct. We set a flag when we post a write and clear it when it completes. We do not post a write if the flag is set. (We do have `async_read` and `async_write` outstanding at the same time.) We might do an `async_shutdown` with a write pending. Is that safe? – David Schwartz Mar 12 '13 at 04:06
  • 2
    With [`ssl::stream`](http://www.boost.org/doc/libs/1_53_0/doc/html/boost_asio/reference/ssl__stream.html), all asynchronous operations, including intermediate operations from composed operations, must be performed within the same implicit or explicit strand. If the composed operations' completion handlers are not wrapped in a strand, then it can produce undefined behavior when multiple threads are running the `io_service`. – Tanner Sansbury Mar 12 '13 at 04:22
  • @twsansbury: Ouch. Thanks. Make that an answer and I'll accept it. – David Schwartz Mar 12 '13 at 05:13

2 Answers2

5

This is the result of a multi-threading issue.

For most Boost.Asio objects, it is safe to have multiple asynchronous operations pending on an object; it is just specified that concurrent calls on the object are unsafe. Nevertheless, it normally does not present a problem on some types, such as ip::tcp::socket. However, ssl::stream accentuates that:

The application must also ensure that all asynchronous operations are performed within the same implicit or explicit strand.

In your case, the solution is to wrap() each composed operation's completion handler by the same strand. This will result in all intermediate operations being invoked within the strand. For more details about thread safety and nuance details of strands, considering reading this answer.

Community
  • 1
  • 1
Tanner Sansbury
  • 51,153
  • 9
  • 112
  • 169
  • It seems like performing all operations within the same implicit strand is impossible. I don't see any way other than `strand.wrap` to insure that intermediate handlers won't run concurrently with direct calls to `async_*` functions. – David Schwartz Mar 12 '13 at 16:57
  • @DavidSchwartz: With half-duplex communication, one can normally use an implicit strand. However, with full-duplex communication, explicit strands provide a fairly clean solution. The alternative is writing a custom functor wrapping type and implementing [`asio_handler_invoke`](http://www.boost.org/doc/libs/1_53_0/doc/html/boost_asio/reference/asio_handler_invoke.html). – Tanner Sansbury Mar 12 '13 at 17:11
  • I opted to use an explicit strand. Strand-per-connection seems like an ugly model to me, but the functor approach creates huge deadlock risk (I don't like code holding an application-level lock without really knowing it does). I consider this one of the worst Boost warts. They should have made non-blocking operations on SSL connections as thread safe as they are on TCP connections. Because they didn't, I have lots of dispatches to strand-wrapped functions that should have just been able to acquire a lock and call the function directly. – David Schwartz Mar 14 '13 at 19:17
0

It might help if you could provide the code. Kind of hard to analyze it just from the stack trace. If for some reason you can't provide the code, then here are some ideas:

bio_write is the open SSL method that is called to write data out to the socket. If you are trying to call this directly, there may be some issues. Someone else asked for help on trying to use it a couple of months ago and got zero answers.

I am also using ASIO with SSL. My code uses one thread for writing out messages to the socket and another thread to handle reading messages from the socket. My client is talking with multiple servers, but there are a total of 2 threads to handle all socket I/O across all servers. Here is my code for writing out to the socket, which so far has no problems under Windows:

void SSLSocket::SendWorkerThread(SSLSocket* psSLS)
{
   // This thread method that gets called to process the messages to be sent to the server.
   //
   // Since this has to be a static method, call a method on the class to handle server requests.
   psSLS->ProcessSendRequests();
}

void SSLSocket::ProcessSendRequests()
{
   // This method handles sending msgs to the server.
   //
   std::stringstream ss;
   DWORD WaitResult;
   Log.LogString("SSLSocket::ProcessSendRequests: Worker thread " + Logger::NumberToString(boost::this_thread::get_id()) + " started.\n", LogInfo);
   // Loop until the user quits, or an error of some sort is thrown.
   try
   {
      do
      {
         // If there are one or more msgs that need to be sent to a server, then send them out.
         if (SendMsgQ.Count() > 0)
         {
            Message* pMsg = SendMsgQ.Front();
            SSLSocket* pSSL = pMsg->pSSL;
            SendMsgQ.Pop();
            const Byte* pBuf = pMsg->pBuf;
            const int BytesInMsg = pMsg->BytesInMsg;
            boost::system::error_code Error;
            {
               Locking CodeLock(SocketLock); // Single thread the code.
               // unsigned int BytesTransferred = boost::asio::write(*pSSL->pSocket, boost::asio::buffer(pBuf, BytesInMsg), Error);
               boost::asio::async_write(*pSSL->pSocket, boost::asio::buffer(pBuf, BytesInMsg), boost::bind(&SSLSocket::HandleWrite, this,
                  boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred));
            }
            ss << "SSLSocket::ProcessSendRequests: # bytes sent = " << BytesInMsg << "\n";
            Log.LogString(ss.str(), LogDebug2);
            Log.LogBuf(pBuf, BytesInMsg, DisplayInHex, LogDebug3);
         }
         else
         {
            // Nothing to send, so go into a wait state.
            WaitResult = WaitForSingleObject(hEvent, INFINITE);
            if (WaitResult != 0L)
            {
               Log.LogString("SSLSocket::ProcessSendRequests: WaitForSingleObject event error.  Code = " + Logger::NumberToString(GetLastError()) + ". \n", LogError);
            }
         }
      } while (ReqAlive);
      Log.LogString("SSLSocket::ProcessSendRequests: Worker thread " + Logger::NumberToString(boost::this_thread::get_id()) + " done.\n", LogInfo);
   }
   catch (std::exception& e)
   {
      stringstream ss;
      ss << "SSLSocket::ProcessSendRequests: threw an error - " << e.what() << ".\n";
      Log.LogString(ss.str(), LogError);
      Stop();
   }
}

Please note that the handshaking must be completed before trying to write to the socket. Learned that one the hard way.

if this does not help, try printing out the number of bytes it is trying to write out. If it is trying to write out a large number of bytes, then maybe that is worth looking into. ASIO might be ok with a large number, but SSL might not. You might also try putting the program in debug and set a breakpoint on the call to the write statement. Try taking a look at the buffer structure to make sure it is intact as well as other objects like the io_service object and socket object to see if anything has been stepped on or perhaps unintenionally destructed.

I am using version c of the SSL library, since the OpenSSL website claimed that is the most stable version. So, if you are using a more recent version than that and all else fails, you might try using version c instead.

Community
  • 1
  • 1
Bob Bryan
  • 3,687
  • 1
  • 32
  • 45
  • 1
    Boost has `async_handshake`. Having your one and only sending thread sleep if a handshake isn't complete doesn't make much sense. – David Schwartz Mar 12 '13 at 04:20
  • this doesn't really answer David's question – Sam Miller Mar 12 '13 at 04:31
  • What would you do instead? Before, I was only talking with one server and did not have the sleep code in there. Now I am talking with 6 servers, which can be connected to at any point in time. I really don't want to create another thread. So, what to do? Maybe I should put in a counter so it will exit the loop in a reasonable period of time if the handshaking should fail - which I have never seen. The code works fine for non-error cases and I have not seen it go through that loop more than 3 times. – Bob Bryan Mar 12 '13 at 04:34
  • @BobBryan: If a handshake wasn't complete, I'd call `async_handshake`. I'd know the handshake was complete when my callback function was invoked. – David Schwartz Mar 12 '13 at 05:21
  • @DavidSchwartz: Thank you for your suggestion - you were right the design I had was not too good. I have changed the code so that it no longer tries to send any messages before the handshake is complete. This was not as easy as it appears because the message processing is all done in C# code, so I had to implement a new message to notify the C# code that it can now write to a given server. – Bob Bryan Mar 12 '13 at 20:04