asio implicit strand and data synchronization

Question

When I read asio source code, I am curious about how asio making data synchronized between threads even a implicit strand was made. These are code in asio:

io_service::run

 mutex::scoped_lock lock(mutex_);

  std::size_t n = 0;
  for (; do_run_one(lock, this_thread, ec); lock.lock())
    if (n != (std::numeric_limits<std::size_t>::max)())
      ++n;
  return n;

io_service::do_run_one

 while (!stopped_)
  {
    if (!op_queue_.empty())
    {
      // Prepare to execute first handler from queue.
      operation* o = op_queue_.front();
      op_queue_.pop();
      bool more_handlers = (!op_queue_.empty());

      if (o == &task_operation_)
      {
        task_interrupted_ = more_handlers;

        if (more_handlers && !one_thread_)
        {
          if (!wake_one_idle_thread_and_unlock(lock))
            lock.unlock();
        }
        else
          lock.unlock();

        task_cleanup on_exit = { this, &lock, &this_thread };
        (void)on_exit;

        // Run the task. May throw an exception. Only block if the operation
        // queue is empty and we're not polling, otherwise we want to return
        // as soon as possible.
        task_->run(!more_handlers, this_thread.private_op_queue);
      }
      else
      {
        std::size_t task_result = o->task_result_;

        if (more_handlers && !one_thread_)
          wake_one_thread_and_unlock(lock);
        else
          lock.unlock();

        // Ensure the count of outstanding work is decremented on block exit.
        work_cleanup on_exit = { this, &lock, &this_thread };
        (void)on_exit;

        // Complete the operation. May throw an exception. Deletes the object.
        o->complete(*this, ec, task_result);

        return 1;
      }
    }

in its do_run_one, the unlock of mutex are all before execute handler. If there is a implicit strand, handler will not executed concurrent, but the problem is: thread A run a handler which modify data, and thread B run next handler which read the data which had been modified by thread A. Without the protect of mutex, how thread B seen the changes of data made by thread A? The mutex unlocking ahead of handler execution doesn't make a happen-before relationship between threads access the data which handler accessed. When I go further, the handler execution use a thing called fenced_block:

 completion_handler* h(static_cast<completion_handler*>(base));
    ptr p = { boost::addressof(h->handler_), h, h };

    BOOST_ASIO_HANDLER_COMPLETION((h));

    // Make a copy of the handler so that the memory can be deallocated before
    // the upcall is made. Even if we're not about to make an upcall, a
    // sub-object of the handler may be the true owner of the memory associated
    // with the handler. Consequently, a local copy of the handler is required
    // to ensure that any owning sub-object remains valid until after we have
    // deallocated the memory here.
    Handler handler(BOOST_ASIO_MOVE_CAST(Handler)(h->handler_));
    p.h = boost::addressof(handler);
    p.reset();

    // Make the upcall if required.
    if (owner)
    {
      fenced_block b(fenced_block::half);
      BOOST_ASIO_HANDLER_INVOCATION_BEGIN(());
      boost_asio_handler_invoke_helpers::invoke(handler, handler);
      BOOST_ASIO_HANDLER_INVOCATION_END;
    }

what is this? I know fence seems a sync primitive which supported by C++11 but this fence is totally writen by asio itself. Does this fenced_block help to do the job of data synchronization?

UPDATED

After I google and read this and this, asio indeed use memory fence primitive to synchronize data in threads, that is more faster than unlock till the handler execute complete(speed difference on x86). In fact Java volatile keyword is implemented by insert memory barrier after write & before read this variable to make happen-before relationship.

If someone could simply describe asio memory fence implemenation or add something I missed or misunderstand, I will accept it.

Can you elaborate as to what you do not understand with the thread A and B problem? If no concurrency occurs because handlers are executing through a strand (implicit or explicit), then if thread A modifies X, why would thread B, which runs after thread A, not observe a change to X? A mutex provides no benefits in the absence of concurrency. — Tanner Sansbury, Jun 01 '13 at 16:22
This is visibility problem and it defined clearly in java memory model. C++ has no such thing(well defined memory model) but I think it is same due to thread cache because this is a character of CPU. Thread A modify data might only modify the copy which cached in its own area(CPU cache) but left the main memory untouched. Without a synchronization, thread B may or may not seen this updated value, that's depend on whether cpu cache flushing to main memory. But in c++, there is no such thing called visibility problem, it just says: whenever write or read a variable, use a lock — jean, Jun 02 '13 at 10:31

Tanner Sansbury · Accepted Answer · 2013-06-03T16:27:42.930

Before the operation invokes the user handler, Boost.Asio uses a memory fence to provide the appropriate memory reordering without forcing mutual execution of handler execution. Thus, thread B would observe changes to memory that occurred within the context of thread A.

C++03 did not specify requirements for memory visibility with regards to multi-threaded execution. However, C++11 defines these requirements in § 1.10 Multi-threaded executions and data races, as well as the Atomic operations and Thread support library sections. Boost and C++11 mutexes do perform the appropriate memory reordering. For other implementations, it is worth checking the mutex library's documentation to verify memory reordering occurs.

Boost.Asio memory fences are an implementation detail, and thus always subject to change. Boost.Asio abstracts itself from the architecture/compiler specific implementations through a series of conditional defines within asio/detail/fenced_block.hpp where only a single memory barrier implementation is included. The underlying implementation is contained within a class, for which a fenced_block alias is created via a typedef.

Here is a relevant excerpt:

#elif defined(__GNUC__) && (defined(__hppa) || defined(__hppa__))
# include "asio/detail/gcc_hppa_fenced_block.hpp"
#elif defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
# include "asio/detail/gcc_x86_fenced_block.hpp"
#elif ...

...

namespace asio {
namespace detail {

...

#elif defined(__GNUC__) && (defined(__hppa) || defined(__hppa__))
typedef gcc_hppa_fenced_block fenced_block;
#elif defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
typedef gcc_x86_fenced_block fenced_block;
#elif ...

...

} // namespace detail
} // namespace asio

The implementations of the the memory barriers are specific to the architecture and compilers. Boost.Asio has a family of asio/detail/*_fenced_blocked.hpp header files. For example, the win_fenced_block uses InterlockedExchange for Borland; otherwise it uses the xchg assembly instruction, which has an implicit lock prefix when used with a memory address. For gcc_x86_fenced_block, Boost.Asio uses the memory assembly instruction.

If you find yourself needing to use a fence, then consider the Boost.Atomic library. Introduced in Boost 1.53, Boost.Atomic provides an implementation of thread and signal fences based the C++11 standard. Boost.Asio has been using its own implementation of memory fences prior to the Boost.Atomic being added to Boost. Also, the Boost.Asio fences are scoped based. fenced_block will perform an acquire in its constructor, and a release in its destructor.

asio implicit strand and data synchronization

1 Answers1

Linked