why are multiple shared_future objects needed to synchronize data

Question

A pointer to a data structure is shared with multiple threads via std::promise and std::shared_future. From the book 'C++ concurrency in action' by Anthony Williams (pg. 85-86), it seems that data is only correctly synchronized when each receiving thread uses a copy of the std::shared_future object as opposed to each thread accessing a single, global std::shared_future.

To illustrate, consider a thread creating bigdata and passing a pointer to multiple threads that have read-only access. If data synchronization between threads is not handled correctly, memory reordering may lead to undefined behavior (eg. a worker_thread reading incomplete data).

This (incorrect ?) implementation uses a single, global std::shared_future:

#include <future>

struct bigdata { ... };

std::shared_future<bigdata *> global_sf;

void worker_thread()
{
    const bigdata *ptr = global_sf.get();
    ...  // ptr read-only access
}

int main()
{
    std::promise<bigdata *> pr;
    global_sf = pr.get_future().share();

    std::thread t1{worker_thread};
    std::thread t2{worker_thread};

    pr.set_value(new bigdata);
    ...
}

And in this (correct) implementation, each worker_thread gets a copy of std::shared_future:

void worker_thread(std::shared_future<bigdata *> sf)
{
    const bigdata *ptr = sf.get();
    ...
}

int main()
{
    std::promise<bigdata *> pr;
    auto sf = pr.get_future().share();

    std::thread t1{worker_thread, sf};
    std::thread t2{worker_thread, sf};

    pr.set_value(new bigdata);
    ....

I am wondering why the first version is incorrect.

If std::shared_future::get() was a non-const member function, it would make sense since accessing a single std::shared_future from multiple threads would then be a data race itself. But since this member function is declared const, and the global_sf object is synchronized with the threads, it is safe to access concurrently from multiple threads.

My question is, why exactly is this only guaranteed to work correctly if each worker_thread receives a copy of the std::shared_future ?

"Unlike std::future, std::shared_future's shared state is not invalidated when get() is called." http://en.cppreference.com/w/cpp/thread/shared_future/valid — xaxxon, Dec 29 '16 at 00:53
@xaxxon yes, otherwise you could not call `get()` on a single, global `shared_future` from multiple threads — LWimsey, Dec 29 '16 at 01:20

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

Your implementation using a single, global shared_future is completely fine, if slightly unusual, and the book appears to be mistaken.

[futures.shared_future] ¶2

[ Note: Member functions of shared_future do not synchronize with themselves, but they synchronize with the shared state. — end note ]

Notes are non-normative, so the above is redundantly making explicit a fact which is already implicit in the normative wording.

[intro.races] ¶2

Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location.

¶6

Certain library calls synchronize with other library calls performed by another thread.

[...Additional paragraphs defining happens before in terms of synchronizes with...]

¶19

Two actions are potentially concurrent if they are performed by different threads... The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other...

[res.on.data.races] ¶3

A C++ standard library function shall not directly or indirectly modify objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function’s non-const arguments, including this.

So we know that calls to global_sf.get() in different threads are potentially concurrent unless you accompany them with additional synchronization (e.g. a mutex). But we also know that calls to global_sf.get() in different threads do not conflict, because it is a const method and hence forbidden from modifying objects accessible from multiple threads, including *this. So the definition of a data race (unsequenced, potentially concurrent conflicting actions) is not satisfied, the program does not contain a data race.

One would usually wish to avoid global variables anyway, but that is a separate issue.

Note that if the book is correct, then it contains a contradiction. The code which it claims is correct still contains a global shared_future which is accessed from multiple threads when they create their local copies:

void worker_thread()
{
    auto local_sf = global_sf; // <-- unsynchronized access of global_sf here

    const bigdata *ptr = local_sf.get();
    ...
}

I agree there is no data race for the reasons you mentioned. I wasn't quite sure whether (reference) access to the `bigdata *` in the shared state would be synchronized with the workers, but the line you included _"Member functions of `shared_future` ... synchronize with the shared state"_ contains the answer. — LWimsey, Jan 01 '17 at 20:44
In the code you included, why is access to `global_sf` unsynchronized ? It should be synchronized with the creation of the thread (since that induces a happen-before relationship) and the `shared_future` copy constructer treats it as `const` — LWimsey, Jan 01 '17 at 20:45
The access of `global_sf` in thread `t1` is unsynchronized with respect to the access of `global_sf` in thread `t2`, just like in your first example. It is treated as `const`, that's why both examples are OK. My last point was that the book claims that your first example is wrong and that my last example is correct, which is inconsistent. The book doesn't suggest your second example. — Oktalist, Jan 01 '17 at 21:31

why are multiple shared_future objects needed to synchronize data

1 Answers1

Linked