Cache coherency doesn't take effect in an infinite loop demo

Question

There is such a saying in this question:

All caches are coherent. That is, you will never had two different values for the same memory location. This coherency is maintained by a version of the MESI protocol

My question is why iv.stop is always false in the block of code below, it seems that cache coherency doesn't take effect. BTW, my PC's CPU is i7-4700HQ.

I definitely know right here there is no happens-before relationship between the read action and the write action of the shared variable stop and this is a data race in Java. I just want to know why cache coherency doesn't take effect. Because thread t2 has changed the cache of stop in its running core, it seems that the core's cache should see this change where thread t1 is running according to cache coherency.

public class InfiniteLoop {
    boolean stop = false;

    Boolean another = null;

    public static void main(String[] args) {

        final InfiniteLoop iv = new InfiniteLoop();

        Thread t1 = new Thread(() -> {
            System.out.println("t1 iv address-->"+iv); //t1 iv address-->com.nestvision.thread.InfiniteLoop@48b96cb8
            while (!iv.stop) {
                //Not System.out.println(iv.stop) here to avoid 
                //a lock action of PrintStream object.
                iv.another = iv.stop;
            }
            System.out.println("done");
        });

        Thread t2 = new Thread(() -> {
            try {
                Thread.sleep(100);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
            System.out.println("t2 iv address-->"+iv);//t2 iv address-->com.nestvision.thread.InfiniteLoop@48b96cb8
            iv.stop = true;
            System.out.println("t2 end");
        });
        t2.start();
        t1.start();
        try {
            Thread.sleep(1000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        System.out.println(iv.another); //this will print 'false'
    }
}

When programming in Java, you should probably read about Java synchronization and the Java memory model, not about low-level CPU stuff. — user140547, Mar 13 '17 at 14:07
Implement an atomic getter and setter for iv, and give it a go. — Isuru H, Mar 13 '17 at 16:54
Could you also try to print the memory address (object address within JVM) of iv inside both threads. — Isuru H, Mar 13 '17 at 21:15
@IsuruH getter and setter act the same behavior. I printed the address of `iv` in each thread you'd see in the code above, they're the same object as prediction. — XiangZzz, Mar 14 '17 at 02:55
@XiangZzz thanks for the addresss, but I do not see any atomic getters and setters for iv in the code. — Isuru H, Mar 14 '17 at 07:30
@IsuruH I didn't posted them here, but I did a test in my IDE. It seems the reason is found out, take a look at bashnesnos's comments of the answer below if you are interested. — XiangZzz, Mar 14 '17 at 08:18
thanks, I saw @bashnesnos comments, but I thought introducing an atomic getter and setter would have introduced a barrier in the background. And frankly I thought Java is such a rich language and its atomicity properties should have spare you the trouble of adding a barrier in the code, when there is atomic access to the concurrently shared variable. — Isuru H, Mar 14 '17 at 08:28
@IsuruH Did your "atomic getters and setters" mean some methods in `java.util.concurrent.atomic` package? If it is, of course it will introduce a memory barrier and the loop will be broken. But I didn't introduce any barrier here in the loop to reproduce infinite loop on purpose. — XiangZzz, Mar 14 '17 at 08:43
either that or a custom getter and setter with a lock. So in your code instead of directly accessing iv, you will access it via these methods which will serialize accesses to the variable allowing modifications to be coherent among cores. — Isuru H, Mar 14 '17 at 09:19

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

First, your print at the end will could print before the threads start and definitely before they both finish. You probably want to use some of the java concurrency classes to block the main thread until the other threads finish although that will probably infinite loop and wait forever until you do the fix below.

Another problem is your stop variable is not volatile so different threads may not see the value.

This other answer has information regarding boolean values and L1/L2 caches across processors How is memory inconsistency different from thread interleaving?

Relevant quote:

Thread A reads the boolean value: it will necessarily read false (see the previous bullet point). When this read happens, it might happen that some memory pages, including the one containing that boolean, will be cached into Core 1's L1 cache, or L2 cache -- any cache local to that specific core.

Thread A negates and stores the boolean value: it will store true now. But the question is: where? Until a happens-before occur, Thread A is free to store this new value only on the local cache to the Core running the thread. Therefore, it is possible that the value will be updated on Core 1's L1/L2 cache, but will remain unchanged in the processor's L3 cache or in RAM.

After some time (according to your wall clock), Thread B reads the boolean value: if Thread A didn't flush the changes into L3 or RAM, it's entirely possible that Thread B will read false. If Thread A, on the other hand, had flushed the changes, it is possible that Thread B will read true (but still not guaranteed -- Thread B might have received a copy of Thread M's view of the memory and, due to the lack of happens-before, it won't go to the RAM again and will still see the original value).

The only way to guarantee anything is to have an explicit happens-before: it would force Thread A to flush its memory, and would force Thread B to read not from a local cache, but truly read it from the "authoritative" source.

Without a happens-before, as you can see form the example above, anything can happen, no matter how much time (from your perspective) elapses between events in different threads.

...

If that boolean variable is marked as volatile...then, only then, Thread B is guaranteed to see true (ie, otherwise there are no guarantees at all).

The reason is that volatile helps establish happens-before relationships. It goes as follows: a write to a volatile variable happens-before any subsequent read to the same variable.

In fact thread t1 will never end, `iv.another` is initialized as `null`, so if `iv.another` prints `false`, it is sure that `iv.another = iv.stop` is executed, I made thread t2 sleep for 100 ms, main thread for 1000 ms, which guarantees "t2 end" is printed before `iv.another`. — XiangZzz, Mar 13 '17 at 14:35
I'm afraid that your referring answer just says what will happen, but doesn't mention why it doesn't respect cache coherency. — XiangZzz, Mar 13 '17 at 15:11
@XiangZzz well it literally means, that cache coherency is triggered by your code - it's not applied by default as cache synchronization is a costly thing and it shouldn't be applied for every programm — bashnesnos, Mar 13 '17 at 16:18
@bashnesnos I had the same opinion as you before, but processor memory model has a stronger and a weaker type, memory barriers is unnecessary on a strong memory model most of the time(maybe sometimes), I guess, this means that cache coherency will be triggered automatically on strong memory model. And my CPU is just a strong memory model processor. — XiangZzz, Mar 14 '17 at 03:21
@bashnesnos You can check this opinion [here](http://www.cs.umd.edu/~pugh/java/memoryModel/jsr-133-faq.html#whatismm), it says: Some processors exhibit a strong memory model, where all processors see exactly the same value for any given memory location at all times. Other processors exhibit a weaker memory model, where special instructions, called memory barriers, are required to flush or invalidate the local processor cache in order to see writes made by other processors or make writes by this processor visible to others. — XiangZzz, Mar 14 '17 at 03:30
@XiangZzz correct, but if your CPU is Intel Core i7-4700HQ - it's the one which requires memory barriers. Why do you think your CPU doesn't need that? — bashnesnos, Mar 14 '17 at 06:55
@bashnesnos Is that real? I forget where I got this impression that my CPU is a strong memory model. Would you please tell me how to examine this? — XiangZzz, Mar 14 '17 at 07:20
@XiangZzz That's a lot of read (see Chapter 8): https://software.intel.com/sites/default/files/managed/7c/f1/253668-sdm-vol-3a.pdf — bashnesnos, Mar 14 '17 at 07:46

Cache coherency doesn't take effect in an infinite loop demo

1 Answers1