2

I came across supposedly thread-safe code that both I and gcc thread sanitizer don't consider really thread-safe.

The code is something along these lines:

class thread_safe
{
public:
  thread_safe(uint64_t size=0)
    : size_{size}
  {}

  uint64_t get_size()
  {
    if(size_ == 0)
    {
      size_ = query_size();

      if(size_ == 0)
        throw std::runtime_error("couldn't get size");
    }

    return size_;
  }

private:
  // not certain this even matters but maybe vtable could make it worse so placing it here
  virtual uint64_t query_size() = 0;

  uint64_t size_;
};

My guess is that the author didn't want to pay for std::atomic access since query_size() is guaranteed to always return the same value.

So if I consider the "integer assignment is an atomic operation" (not certain whether this is really true or if uint64_t still falls under any such reasoning since it's 64 and not 32 bit but ok...) and that query_size() always returns the same value so per-core-cache should not be an issue... then I can squint my eyes and try to convince myself that this really is thread-safe.

So could this be considered thread safe under C++ in general?

(or at least under some circumstances) (maybe only under C? but I doubt that there is a difference in the standard)

Is there a solution to make assignment inside get_size() function thread safe without changing size_ type or initialization location?

EDIT:

To be clear, I'm not saying that this code is well written, just want to understand if I can present a more solid case against it.

And regarding data race the code has two states:

if (in_current_thread_I_see_size_not_equal_0)
  return size_

(size_ was set in some other thread and eventually came to this core)

and

if (in_current_thread_I_see_size_equal_to_0)
  size_ = set_size_in_this_thread_to_same_value_that_will_be_written_in_other_threads

return size_

(size was not set in any other threads or the value from those writes hasn't reached this threads memory/register)

So in the end for each thread the "size_ already set to != 0" is just optimization (no need to read size from somewhere that could be expensive) and even though it's a race it's not necessarily an important one.

I'm interested into finding out if either standard or at least hardware supports the reasoning of the initial author of the code.

Domen Vrankar
  • 1,743
  • 15
  • 19
  • 2
    Different case, but it explains that it doesnt matter if your hardware can read/write `uint64_t` atomically, its a data race. https://stackoverflow.com/q/16320838/4117728 – 463035818_is_not_an_ai Apr 12 '22 at 12:53
  • @463035818_is_not_a_number The difference between my question and the others is that in my case even if 5 threads set size_ they all set it to the same value so for each thread it's either size_==0 or size_==set_to_value_by_some_thread and the final value in memory would be either 0 or same_value_no_matter_which_thread_wrote_it - it's more like a flag with two states that go from not_set to set_to_value than a variable that's always changing state and different values can come from cache to final memory location. – Domen Vrankar Apr 12 '22 at 13:16
  • 1
    Access to `std::atomic` may be free on your system if `std::atomic::is_always_lock_free == true`. I'm not sure if you can verify atomicity of assignment of something that is not `std::atomic`, and that would be critical to tell if `get_size` is thread-safe or not. – Yksisarvinen Apr 12 '22 at 13:19
  • afaik the value does not matter, but its much simpler: If one thread writes and another reads then its a data race – 463035818_is_not_an_ai Apr 12 '22 at 13:21
  • btw i dont understand the constructor. If a `size` parameter `!=0` is passed then `query_size` is never called. In that case the function is thread safe – 463035818_is_not_an_ai Apr 12 '22 at 13:23
  • @463035818_is_not_a_number It technically is a data race, but as long as `query_size` is thread-safe and really outputs the same value on every call and `_size= someValue;` is atomic, the output will always be the same. Each write writes the same value, and each read will eventually read that data, whether in the first or in second `if`. – Yksisarvinen Apr 12 '22 at 13:25
  • If two or more threads access the same data without synchronization and at least one of them writes to that data you have a data race and, consequently, undefined behavior. If you know what your compiler and hardware do in those circumstances you could determine that the usage is, in fact, thread safe, but you have moved outside of standard C++. – Pete Becker Apr 12 '22 at 13:32
  • @463035818_is_not_a_number Yes. If somebody provides the size upfront it's thread-safe due to being set before code branches out. My problem and paranoia is with the ==0 case... – Domen Vrankar Apr 12 '22 at 13:33
  • if `uint64_t` is thread safe, I believe the compiler would remove all overhead of `std::atomic` anyway (as long as it knows the target) – apple apple Apr 12 '22 at 13:46
  • 2
    This is a data race. In practice, you will get away with it on 64-bit processors. (On 32-bit processors, you have a race when a second thread sees a partially-written `size_`.) You can avoid the overhead on 64-bit processors by using `std::atomic` and `memory_order_relaxed`. (The overhead is unavoidable on 32-bit processors, since you have to avoid tearing.) – Raymond Chen Apr 12 '22 at 13:49
  • 1
    Re, "It technically is a data race, but..." If it technically is a data race, then the behavior technically is undefined, and you would technically be the person at whom the finger is pointed if, at some time in the future, something changes (compiler version, compiler options, operating system version,...) to invalidate your assumption that "...the output will always be the same." I personally have had to diagnose a problem, and explain it to a customer, that was caused by a program depending on UB. The program worked, literally for years, until we upgraded our build tools. – Solomon Slow Apr 12 '22 at 14:33

2 Answers2

1

This is not thread safe:

    if(size_ == 0)
    {
      size_ = query_size();
      // ...
    }
    return size_;

One thread could be reading size_ while another thread is writing to it. The writing thread may have only completed a partial write before the reader reads the value (a value that is nonzero because of the partial write), which it then returns. The caller in the reader thread will now receive a different value than the caller in the writer thread.

The C++ standard does not guarantee that reading/writing a uint64_t is atomic.

AndyG
  • 39,700
  • 8
  • 109
  • 143
  • So it would seem that I was right regarding not being C++ standard. Until a week ago I would have agree with this answer but combination of https://stackoverflow.com/a/2729701/102068 (yes I saw the "not entirely correct answer" comment) "I have never in my life heard of an 64-bit integer write, which can be split into two 32-bit writes, being interrupted halfway through." + code owners assuring me that this code works correctly in production for the past 12 or so years made me doubt my reasoning and start searching for arguments for/against and explanations if it works on more than pure luck. – Domen Vrankar Apr 12 '22 at 14:03
  • @DomenVrankar the linked answer is misleading at best. At worst it's flat out wrong. Besides, a `uint64_t` is not even guaranteed to be exactly 64 bits per the standard. Only *at least* 64 bits. In the end, the environment that your code is targeted to and runs on will determine if these operations are atomic. The C++ standard does not, and probably will not ever make atomicity guarantees about other primitives (that's why we have `atomic*` types) – AndyG Apr 12 '22 at 14:27
  • Just an update from my research. Above code can be thread safe from the assembly point of view (architecture support + optimizer not doing something strange so still luck) and that's why the authors of my confusion believe that the code is correct. From C++ point of view it's as you said - not standard supported. So all in all an accident waiting to happen if architecture/compiler/optimizations change. – Domen Vrankar Apr 12 '22 at 20:50
  • 2
    @DomenVrankar "but code just works" is one of the most ridiculous statements a programmer can make. If he/she doesn't understand why it works and will continue working this statement cad be discarded right away. And given that the "code owners" didn't even make the variable `volatile` they clearly have no idea what they are doing. – ixSci Apr 13 '22 at 04:57
  • @ixSci I agree with you, but if you're not the owner you need to pitch reasons to change the code... I hope I have enough now but the fact that compiler can actually foe years somewhat relyably generate working code with -O2 makes the convincing harder but at least calms you down that it's not completely broken due to architecture guarantees. – Domen Vrankar Apr 14 '22 at 06:58
0

This code and the comments took me down a rabbit hole.

It turns out that using non atomic variables and them working is a mixture of architecture, use case and luck with compiler/optimizer.

It turns out that as it was noted by some comments std::atomic<uint64_t> operations can be (almost) free on architectures where uint64_t is atomic from the architecture point of view meaning x86_64 (from Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 3A: System Programming Guide, Part 1):

8.1.1 Guaranteed Atomic Operations

...

The Pentium processor (and newer processors since) guarantees that the following additional memory operations will always be carried out atomically:

• Reading or writing a quadword aligned on a 64-bit boundary.

...

Looking at the output assembly reads on x86_64 generate the same assembly for both std::atomic<uint64_t> and uint64_t (but code ordering changes so some optimizations seem to be disabled for std::atomic - first red flag that this only works for cases like the one above where you don't have inter variable dependencies). Assignment emits slightly different assembly and as stated by a comment memory_order_relaxed during assignment generates fairly similar code.

On the other hand on ARM64 LDR/STR are replaced by LDREX/STREX so I'd say that even if this code would work under -O0 (and threads not calling the same function time wise too close together) it would probably fail under any other optimization level even though size_ is always assigned the same value.

I haven't checked any 32bit architecture but based on a comment it sounds that it's similar to ARM64 differences that I saw - pushing luck.

So the bottom line is that:

  • the code really does work reliably and correctly (albeit due to favorable circumstances and luck)
  • is not supported by the C++ standard/memory model and thus is UB (as stated by the comments) even in case of such coincidental hardware support
  • future instruction set improvements and/or new compiler optimizations could easily break this behavior since it doesn't satisfy C++ abstract machine

Makes me feel a bit better to know that it's not tsan being too strict but actual luck that code works correctly for now :)

Domen Vrankar
  • 1,743
  • 15
  • 19