Concurrent threads and data race

Question

I'm reading a paper of Boehm: "Threads Cannot be Implemented as a Library" but struggling with an example about why library-based solution for thread does not help.

The example gives an initialization zero for both x and y then there are two threads

// thread 0
if (x == 1) ++y;

// thread 1
if (y == 1) ++x;

I understand that there is no data race because, under sequential consistency, any thread will always see each other as the comparison (i.e. data load) occurs before the assignment (i.e. data store).

So there is no way, for example, the following situation occurs:

// thread 0 assigns y
++y;            // that means x is already 1

// thread 1 is checking
if (y == 1)     // that means x is still 0

Consequently, so the outcome x = 0; y = 0 is warranted.

However a compiler, which awares single-threaded code only, is free to reorder these threads as

// thread 0
++y; if (x != 1) --y;

// thread 1
++x; if (y != 1) --x;

then there is data race, for example

// thread 0 stores y
--y;

// thread 1 load y
if (y != 1)

But I still do not understand why a thread library cannot help, the most trivial implementation that I can think of is simply using a mutex to avoid data race:

int x, y = 0;
mtx_t m;

int thread0(void *arg)
{
  mtx_lock(&m); if (x == 1) ++y; mtx_unlock(&m);
  return y;
}

int thread1(void *arg)
{
  mtx_lock(&m); if (y == 1) ++x; mtx_unlock(&m);
  return x;
}

int main()
{
  thrd_t t0, t1;
  int r0, r1;

  mtx_init(&m, mtx_plain);

  thrd_create(&t0, thread0, NULL);
  thrd_create(&t1, thread1, NULL);

  thrd_join(t0, &r0);
  thrd_join(t1, &r1);

  printf("x = %d, y = %d\n", r1, r0);

  return 0;
}

There is a similar question, and the answer is about circular definition of data-race in pthreads. But with the implementation above, I don't care about whatever the memory model is, the compiler is free to reorder the code if it wants, but the output x = 0, y = 0 is always warranted.

Do I misunderstand anything?

As one of the answers in the _similar_ question you point out says, thread utilities (such as those provided in _Boost_.) are wrappers of thread services provided by the operating system. And, It is not the compiler that orders thread execution. The compiler, at compile time simply creates the machine code. It is the operating system during run-time that orders thread execution and allocates time-slices for each thread in due time. There are several links _[included in this post](https://stackoverflow.com/a/19324717/645128)_ that may help to expand your understanding of threads in general. — ryyker, Aug 06 '19 at 11:49
"_I understand that there is no data race because_ (...)" In my Q [What formally guarantees that reads don't see a write (with race condition) in other threads](https://stackoverflow.com/q/56673990/963864) the consensus was that **[there is no such guarantee](https://stackoverflow.com/questions/56673990/what-formally-guarantees-that-reads-dont-see-a-write-with-race-condition-in-o#comment99915442_56673990)** (9 upvotes for that claim!) — curiousguy, Aug 11 '19 at 21:31

score 3 · Accepted Answer · answered Aug 06 '19 at 16:37

While it is likely that you misunderstand some thing, your analysis is fine. The paper referenced overplays its thesis.

The UNIX and Linux kernels (amongst many others) themselves are large multi-threaded programs which operate with only (the equivalent of) library-based thread support. These large multi-threaded programs have exhibited a shocking performance, reliability and scalability from tiny pdp-based computers to massive supercomputers.

A Java-based OS was produced by Sun Labs to the open ridicule of all who had the opportunity to give it a whirl. It remains in an unmarked grave.

The secondary line of reasoning, that busy-waiting is more effective than locking primitives had been kicking around for at least a decade before that paper. Everybody loves lockless because it makes great benchmarks, whereas unbounded non-deterministic race condition scares people who want nice safe systems. The thing is, sometimes racy compare-and-swap (CAS) is good, sometimes it is bad. Some clever systems use an optimistic CAS to implement mutexes, leaving to opportunity for somewhat readable code, and good benchmarks.

Again, the bold statement of impossibility is hyperbolic, based on the idea that the compiler is capricious, so will make menacing assumptions and over-write memory at will. Thankfully, the “free and good enough” technologies slew these dragons.

Re, "...the idea that the compiler is capricious..." I've been burned by a capricious compiler: A bug, subtle enough to go undetected in testing, appeared in our product when we started using a new version of GCC. Our code used Undefined Behavior: We tested the result of an integer addition to see if it overflowed. Older versions of the compiler optimized the code as if the hardware would do what the hardware actually does in that case. New compiler said, Oh! UB! I can assume that overflow never happens, and that branch of the code therefore is unreachable. It _silently_ threw it away. — Solomon Slow, Aug 06 '19 at 22:00
It is a sad turn. In the 90s the WATCOM compiler was one of these; I had to work with it doing OS work, and it was just awful. Mercifully gcc kicked it out to the street a generation ago. Now, hopefully, clang will be resist the benchmark urg and put gcc out of its misery.... — mevets, Aug 06 '19 at 22:18

Concurrent threads and data race

1 Answers1