0

TL;DR: What is meant by saying a specific function is 'thread-safe' as a data race occurs by simultaneously calling two possibly different functions? This question is especially relevant in the context of people telling "const means/implies thread-safe in C++11" [1][2]


Consider the following example:

class X {
    int x, y; // are some more complex type (not supported by `std::atomic`)
    std::mutex m;
public:
    void set_x (int new_x) {x = new_x;} // no mutex
    void get_x () const {return x;}

    void set_y (int new_y) {
        std::lock_guard<std::mutex> guard(m); // guard setter with mutex
        y = new_y;
    }
    void get_y () const {return y;}
}

Is set_x thread safe?

Off course, set_x is not thread safe as calling it from two threads simultaneously results in a data race.

Are get_x, get_y and set_y thread safe?

Two possible reasonings exists:

  1. Yes, they are thread safe, as calling get_x/get_y/set_y from two threads simultaneously does not result in a data race.
  2. No, they are not thread safe, as calling get_x (or get_y) and set_x (or set_y) from two threads simultaneously results in a data race.

Which one is the correct reasoning for each of those three functions?


Question summary

Which reasoning is correct?

  1. A function is thread safe iff calling it from two threads simultaneously does not result in a data race. Could work for set_x/get_x, but fails for set_y/get_y, as this would result to the conclusion that set_y and get_y are thread safe, but class Y isn't as calling set_y and get_y from two threads simultaneously results in a data race.
  2. A function is thread safe iff it does not access any memory that could be modified without internal synchronization by another function. This seems to me the most consistent option, but is not the way it is often used (see related threads).

Related threads

Note that I have read the following related threads:

m7913d
  • 10,244
  • 7
  • 28
  • 56
  • 3
    Consider `std::atomic x;` instead of `int x;` for the thread safe implementation. – Richard Critten Apr 18 '21 at 00:20
  • @RichardCritten, you are absolutely right, but it will not be enough to ensure thread safety. The method used to manipulate the shared variable x must be in a synchronized way. – Manifest Man Apr 18 '21 at 00:29
  • @ManifestMan, why isn't it enough for thread safety? – m7913d Apr 18 '21 at 00:31
  • @RichardCritten, you are right, but that solution does not work for more complex types, maybe I should edit my example to use a more complex data member such as `std::string`. – m7913d Apr 18 '21 at 00:35
  • @RichardCritten, in addition to the definition of atomic variable through **std::atomic x** proper atomic load/store are also needed. – Manifest Man Apr 18 '21 at 01:25
  • 1
    @ManifestMan please explain and see also https://en.cppreference.com/w/cpp/atomic/atomic/load , https://en.cppreference.com/w/cpp/atomic/atomic/operator%3D – Richard Critten Apr 18 '21 at 11:40
  • @Richard Critten, the provided link says about **std::atomic::load - Atomically loads and returns the current value of the atomic variable. Memory is affected according to the value of order.**, but the user should better use `std::memory_order_seq_cst` for ordering while calling those functions (consistency reasons). Because so many things can go wrong there. – Manifest Man Apr 19 '21 at 09:55
  • 1
    @ManifestMan The default memory ordering value is `std::memory_order_seq_cst` for both methods `std::atomic::load` and `std::atomic::store`. There's no need to add unnecessary clutter to the code. Deep down, `std::atomic` fulfills its purpose by behaving like a fence / barrier on two different levels: on the compiler level, memory operations aren't reordered across the barrier; on the machine instruction level, the necessary architectural memory barrier instructions are issued, depending on the CPU architecture and the `std::atomic` method called. (cont'd...) – rwong Apr 23 '21 at 01:14
  • 1
    @ManifestMan (...cont'd) Note that the design of `std::atomic` appears to facilitate its use as a semaphore; that is, a flag variable that is to be set by one core after **some other data** had been written completely (by that writer-core); and a flag variable that is to be checked by other cores before commencing a read operation on **that other data**. As far as x86 and ARM are concerned, the current C++ library implementation for `std::atomic` is adequate for guaranteeing correct operation, even if **those other data** aren't written or read using special methods. (cont'd...) – rwong Apr 23 '21 at 01:18
  • @ManifestMan (...cont'd) However, users of GCC will have an additional issue requiring attention - "strict aliasing". GCC is permitted to manipulate memory operations on the advice of its string aliasing analysis; this may affect memory operations, i.e. operations on **those other data** which are typical in C++ and without using special instructions. Examples of such compiler manipulation are: eliminating reads or writes, or replacing them with no-op or constant value, if GCC **believes** that the memory location isn't properly initialized or is never read back. – rwong Apr 23 '21 at 01:22

3 Answers3

1

The C++ standard doesn't use terms like "thread-safe"; it uses more specific language. Humans use terms like "thread-safe" because we find them useful.

The common idea of a thread-safe function is a function that when called, assuming nobody else screws up, will not create a data race. get_x is thread-safe; all things being equal, you may call it from any number of threads and get reasonable results. This is true even though it cannot be concurrently called with set_x, as that causes a data race. But the cause of the data race is that you called a non-thread-safe function: set_x.

The point of categorizing functions as "thread-safe" or not is to assign blame. If you call only "thread-safe" functions, then your code is "thread-safe". If you stray outside of the boundaries of "thread-safe" functions, then it is your straying outside of those boundaries that causes the data race.

It's not get_x's fault that a concurrent set_x call causes a data race.

As for the question of get/set_y, as previously stated, "thread-safe" is not a computational term or a rigorously standard term. It is a human term, a simplification of the computational reality.

The rules of what is "thread-safe" is basically, "you can call any thread-safe function concurrently with any other thread-safe function". If you cannot call get_y and set_y concurrently, then they're not "thread-safe".

From a rigorous perspective, the accurate way to describe these two functions is that set_y synchronizes with other calls to set_y on the same object, and get_y synchronizes with other calls to get_y on the same object. The fact that we don't also say that they synchronize with each other tells you what you need to know.

From a simplified perspective, set_y is "thread-safe"; get_y is not. But you could also say that get_y is "thread-safe" and set_y is not. It doesn't really matter, since it's just a simplification.

And whether you declare get_y const is not what makes it "thread-safe". Sutter is saying that, if you write a const function, it is your job to do it in a way such that it is "thread-safe". Therefore, get_y is broken because it is not written in a thread-safe way, since it cannot be called thread-safely with other thread-safe functions.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • I'm aware it is my duty to make a const function bitwise const or internally synchronised. That's way I used the term implies. – m7913d May 01 '21 at 15:49
  • Is it correctly to say that `get_y` is broken according to the C++11 standard? If I interpret it correctly, it's only telling us that calling only const function should be thread safe. – m7913d May 01 '21 at 15:58
  • @m7913d: "*Is it correctly to say that get_y is broken according to the C++11 standard?*" There is no requirement on the user to write a function that is "thread-safe". Only a requirement not to write a *program* which does not introduce a data race. – Nicol Bolas May 01 '21 at 16:33
  • So, what do you mean by `get_y` is broken? It's broken according to the human interpretation of thread safe or it is broken according to the C++ standard as it introduces a data race? – m7913d May 01 '21 at 16:54
  • I agree, that I should have written (to avoid any confusion): "If I interpret it correctly, it's only telling us that calling only const function should not introduce data races." – m7913d May 01 '21 at 16:55
  • @m7913d: "*So, what do you mean by get_y is broken?*" Read the sentence in context. In particular, take note of the sentence right before that one. – Nicol Bolas May 01 '21 at 17:09
  • I'm confused by that context as Sutter made this statement in the context of the C++11 standard, but I believe `get_y` does not violate the C++11 standard as it is a bitwise const function (so no data race is introduced by calling this function and other well-behaving const functions). Sutter uses the term thread-safe in this context for bitwise const or internally synchronised. However, you are saying it doesn't matter wether you call `set_y` or `get_y` thread-safe as it is a simplification. This makes the statement of Sutter meaningless or is at least a confusing usage of thread-safe. – m7913d May 01 '21 at 18:55
  • @m7913d: Sutter is making two points. He is talking about how the standard library treats `const`, *and* he is making the case that this is *also* how users should treat `const`. One of these is definitive, the other is suggestive. The standard can enforce this on its own types, and the standard library will *assume* this for any types you give it to manipulate, but it has no say about how you treat your own types. – Nicol Bolas May 01 '21 at 19:11
  • @m7913d: Let me put it this way. Whether Herb Sutter would call either of the `y` functions "thread-safe" is not nearly as important as what a user will try to do with that information. And if you tell a user that both of those functions are "thread-safe", they will *definitely* feel that it is OK to call either of them on the same object from any thread. And their code will *definitely* be broken. So are they thread-safe? Not in a way that's actually useful to people. – Nicol Bolas May 01 '21 at 19:19
  • But Sutter's point is that almost all classes will be used sooner or later (by you or someone else later on) inside the C++ standard library (f.ex a container class). So, you better make it compatible with it). As stated in the title, I want to be compatible with the C++ standard library and language. (Note that I did quite some research on this topic, before asking the question). – m7913d May 01 '21 at 19:19
  • I think it is relevant wether it is correct to use thread-safe in this case, because if it isn't, we (nor Herb Sutter) should use it in this context as it is confusing and meaningless. – m7913d May 01 '21 at 19:23
  • Do you agree `get_y` complies with the assumptions of the standard library? – m7913d May 01 '21 at 19:26
  • @m7913d: "*But Sutter's point is that almost all classes will be used sooner or later (by you or someone else later on) inside the C++ standard library (f.ex a container class).*" Unless the container class is actually *calling* those functions, their behavior is irrelevant. – Nicol Bolas May 01 '21 at 21:53
  • @m7913d: "*Do you agree `get_y` complies with the assumptions of the standard library?*" I don't see that it matters. `get/set_y` are dysfunctional with regard to how they are *intended* to work, in terms of what the writer of the code clearly *expects* to get out of them. The fact that `set_y` synchronizes access to the member strongly implies that access to `y` is meant to be fully synchronized. But it isn't. What the standard library may or may not do is irrelevant next to the fact that the code is incoherent. – Nicol Bolas May 01 '21 at 21:55
0

I'll also go with the your following statement:

A function is thread safe if it does not access any memory that could be modified without internal synchronization by another function. This seems to me the most consistent option, but is not the way it is often used.

As long as the shared variable is atomic and mutex are properly used to achieve synchronization, I don't see any problem with your above statement.

halfer
  • 19,824
  • 17
  • 99
  • 186
Manifest Man
  • 873
  • 10
  • 16
  • The problem is that the related threads seems to use the first definition for 'thread safe function'. And those guys seems to be highly familiar with the C++ standard especially Herb Sutter (see last related thread). – m7913d Apr 18 '21 at 02:24
  • Are you able to back up this statement with some official documentation (from the C++ Language/Library standard)? – m7913d Apr 18 '21 at 11:02
  • Just take a look [https://en.cppreference.com/w/cpp/language/memory_model#Threads_and_data_races](https://en.cppreference.com/w/cpp/language/memory_model#Threads_and_data_races) – Manifest Man Apr 18 '21 at 16:45
  • That doesn't answer what is meant with a 'thread safe function'. Note that my question is not about how to determine whether a data race (could) occur, but about the terminology of a 'thread safe function' – m7913d Apr 18 '21 at 17:38
  • I would recommand to read following post about **Thread Safety** in this case [https://stackoverflow.com/questions/261683/what-is-the-meaning-of-the-term-thread-safe](https://stackoverflow.com/questions/261683/what-is-the-meaning-of-the-term-thread-safe). – Manifest Man Apr 18 '21 at 18:03
  • I know what thread safety means (no data races), but I'm confused by people stating a single function is thread safe, f.ex. [because it only reads data](https://stackoverflow.com/a/6411001/7621674). Off course the program as a whole is not thread safe if a non-synchronised setter exists. The meaning of a thread safe function is especially relevant if one states that [const functions implies thread safety](https://stackoverflow.com/a/14127380/7621674). – m7913d Apr 18 '21 at 18:42
  • @m7913d It's the data that needs to be protected; functions are a distraction to the discussion. Any code-path (not any function) through a program that can simultaneously read/write (or write/write) the same location needs to be synchronised. – Richard Critten Apr 19 '21 at 00:48
  • @RichardCritten, I agree. Does that mean that stating '_const (functions) implies thread safe_' is nonsense? – m7913d Apr 19 '21 at 09:06
0

Note that this my own opinionated answer based on my own research and provided input from others.

What is the definition of a thread-safe function?

A function is thread safe iff it does not access (read or write) any memory that could be modified without internal synchronization: only set_y is thread-safe.

Note that thread-safe is not explicitly defined by the C++ standard, which uses the term data races. See the answer of Nicol Bolas for more information: thread-safety is not always black and white.

A const function implies thread-safe bitwise const or internally synchronised

The term thread-safe is abused in the context of "a const function implies thread-safe".

What is meant by "a const function implies thread-safe", is that it should be safe to call the const function from multiple threads (without calling a non-const function at the same time in another thread).

As Herb Sutter (29:43) stated himself, in this context, thread-safe means bitwise const or internally synchronised, which isn't really thread-safe if other non-const functions may be called at the same time.

m7913d
  • 10,244
  • 7
  • 28
  • 56