17

Consider the following code that writes the same value to the same memory location from multiple threads:

void f(int* buf, int n, int* p) {
    for(int i = 0; i < n; i++)
        buf[i] = i;
    *p = buf[n/2];
}

void g(int* buf, int n) {
    int x1, x2;
    thread t1(f, buf, n, &x1);
    thread t2(f, buf, n, &x2);
    t1.join();
    t2.join();
    assert(x1 == x2);
}

Although it's interesting, I'm of less concern of what guarantees the standard gives, since I guess it gives none. What I do care is what will be the behavior of the above code on real-world multiprocessor hardware. Will the assert always pass or there's any chance of a race-condition, cache synchronization problems, etc..?

Yakov Galka
  • 70,775
  • 16
  • 139
  • 220
  • 1
    I'm sorry, but I don't quite understand the question. How can a data race (which I think is occurring) change the result? I think x1 always equals to x2 no matter how your program is run. Care to help me understand your question please? – atoMerz Nov 29 '11 at 18:45
  • @AtoMerZ: data-races are more complicated than just those caused by thread alternation. In a multilevel cache systems you can get plenty of strange results. – Yakov Galka Nov 29 '11 at 18:49
  • I think I understand parallelism and a little bit about how caches work. But since I still have problem understanding your question and it keeps being voted up, I'm guessing I'm missing something here. Any hints to clear things up for me? – atoMerz Nov 29 '11 at 18:54
  • @AtoMerZ: Let's phrase it this way: no-one ever said to me what happens on the hardware level if multiple cores write to the same memory location in a non-atomic operation. – Yakov Galka Nov 29 '11 at 19:15
  • I don't think in this scenario assert will ever be passed. But if you're trying to understand what happens in hardware level, I think the most notable effect is false sharing as state by @Maxim Yegorushkin. This could cause performance penalty and slow down of your program but the results are still the same. – atoMerz Dec 02 '11 at 10:06
  • 1
    The C++ compiler is free to rewrite your function f in any way it pleases provided the result is provably the same as what you wrote under the assumption the program is race-free. In particular, it may very well start f by actually clearing the buffer, or it may use parts of the buffer as intermediate storage prior to writing the final values. Why? I have no idea, but modern optimizing compilers can and do stranger things than this all the time, and they are getting more and more aggressive about it. So an assert like this is an accident waiting to happen. – Don Hatch Mar 19 '17 at 23:41

4 Answers4

8

There is a race, but in your example both threads will write the same values to the same addresses. Since you are not doing any read-modify-writes, but just writing predetermined numbers, this will be safe in most cases. Writing an int will be an atomic instruction on most systems. The exception would be if you ran this code on a 8-bit microprocessor that uses a sequence of instructions to store an int. In that case it also may still work, but depends on the implementation of the library code that does the multi-byte store.

TJD
  • 11,800
  • 1
  • 26
  • 34
  • This is pretty much what I was going to say, especially with the caveat that if he was writing a complex type rather than an int, there would be a chance of the `buf[n/2]` value being in an inconsistent state due to one thread writing it while the other is reading it. – Kylotan Nov 29 '11 at 18:53
  • 1
    I write an int just as an example. But it's no-way atomic since there's no memory barriers here. How writing a char or executing it on 8-bit microprocessor may change the result? – Yakov Galka Nov 29 '11 at 18:54
  • @Kylotan: this is exactly my question, how can this happen. You claim it can. – Yakov Galka Nov 29 '11 at 18:55
  • 1
    Depends on the write operation. Imagine a string write - you might decide to memset the whole buffer to zero before writing the new characters in place, meaning a read that takes place could see the empty buffer, or a half-written buffer, etc. It would only apply to types which have an assignment operator whose write operation isn't written in terms of an atomic piece of code. With ints and most primitive types, the chip manufacturer probably makes guarantees that a write is atomic. With other types, it depends on the language, platform, chip, etc. – Kylotan Nov 29 '11 at 19:06
  • Some interesting discussion here - note especially the bit about non-aligned floating point values. http://stackoverflow.com/questions/1292786/is-updating-double-operation-atomic – Kylotan Nov 29 '11 at 19:13
6

Memory models with regards to multi-treading concern when the effects of writes made by one thread are observable by another thread. In the code you posted both threads write the same values into the same memory location, so it doesn't matter which thread's write buf[n/2] reads, either will do.

Modern processors employ cache coherency protocols, such as MESI, so when the threads write to the buffer concurrently there is going to be a lot of messages sent between the CPUs to synchronize the cache lines holding the buffer making it run much slower than in non-concurrent scenario (false sharing effect).

Here it doesn't matter if the writes are atomic or not, since both threads write the same values to the same memory locations. There is a race, but it doesn't matter which thread wins because the observed values are going to be the same even with partial writes.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
  • -1, because the last paragraph is only true for simple data types. Since this is a C++ question, and the original poster stated in a comment that the use of 'int' was just an example, you have to take into account any data type and its assignment operator. As such, it's easy to form examples where partial writes yield inconsistent reads. – Kylotan Nov 30 '11 at 00:17
  • 1
    @Kylotan My post assumes there are no intermediate writes, that is, concurrent threads write only once to each byte. Given that, i think you'd have difficult time forming an example, since partial writes as I said, don't make a difference here. – Maxim Egorushkin Nov 30 '11 at 08:11
  • I gave an example in a comment below - a string assignment could be implemented as clearing a buffer followed by copying into the buffer (or deleting a buffer followed by assigning a new one) - a partial write here followed by a read yields a completely different result. – Kylotan Nov 30 '11 at 11:12
  • @Kylotan: in your example each thread would write to each byte more than once, so it doesn't apply. – Maxim Egorushkin Nov 30 '11 at 11:19
  • The OP said that int was just an example, and the language chosen was C++ which permits complex assignments with the same operator. Still, you've given him the answer he wanted to hear, which presumably means he was primarily interested in cache management and not in language behaviour. – Kylotan Nov 30 '11 at 12:03
3

The key point here is indeed, as @Maxim said, cache coherency. In a cache coherent architecture it's impossible indeed.

However, it can go wrong on a machine with no cache coherency. I don't know a specific architecture, and although they're almost extinct due to natural selection, as far as I know there are some remaining. (If you know an example, please comment.)

Here is a table that represents an execution of two threads filling a zeroed region in memory with ones. For brevity this example is scaled down by a factor of 32, i.e. each digit here represents a 4-byte int in question. Cache line size is 4 ints == 4 digits. The lines marked as "flush" are points where the on-chip cache is flushed to the main memory. In reality it's non-deterministic, as it may happen at any time, e.g. due to a preemptive task switch.

Core 1 cache              Memory                    Core 2 cache
------------------------------------------------------------------------------
                          0000
0000 (load cache)         0000
1000 (set 1st bit)        0000
1100 (set 2nd bit)        0000                      0000 (load cache)
**** (flush)              1100
                          1100                      1000 (set 1st bit)
                          1000                      **** (flush)
                          1000                      1000 (load cache)
                          1000                      1100 (set 2nd bit)
1000 (load cache)         1000                      1110 (set 3rd bit)
1010 (set 3rd bit)        1000                      1111 (set 4th bit)
1011 (set 4th bit)        1111                      **** (flush)
**** (flush)              1011

So we got a wrong result in the end.

I emphasize again that this counter-example is valid only on cache incoherent machines.

Yakov Galka
  • 70,775
  • 16
  • 139
  • 220
  • True, but I don't think any commercial `real processors` (which you were looking for) would leave coherency protocols to programmers. So in real world you wouldn't have to worry about this unless you're designing your own processor. – atoMerz Dec 09 '11 at 09:13
0

No data race.

CPP reference says on Memory model: Threads and data races:

When an evaluation of an expression writes to a memory location and another evaluation reads or modifies the same memory location, the expressions are said to conflict. A program that has two conflicting evaluations has a data race unless:

  • both evaluations execute on the same thread or in the same signal handler, or

  • both conflicting evaluations are atomic operations (see std::atomic), or

  • one of the conflicting evaluations happens-before another (see std::memory_order).

If a data race occurs, the behavior of the program is undefined.

It says: "another evaluation modifies". It does not say "another evaluation writes". Writing the same value to a memory location does not count as modifying it. Therefore, no conflict; therefore, no data race.

Amit
  • 645
  • 1
  • 3
  • 19