2

The code below doesn't ever print Done. However, if I uncomment System.out.println(list.size()); it finishes. Why does this happen?

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.LinkedBlockingQueue;

public class Test3 {

    static BlockingQueue<String> blockingQueue;
    static List<String> list = new ArrayList<>();

    static void run()  {
        while (true) {
            try {
                String s = blockingQueue.take();
                Thread.sleep(1000);
                list.add(s);
            }catch (InterruptedException e){}
        }
    }

    public static void main(String[] args)  {
        blockingQueue = new LinkedBlockingQueue<>();
        blockingQueue.add("test1");
        blockingQueue.add("test2");
        blockingQueue.add("test3");

        CompletableFuture.runAsync(() -> run());

        while (list.size() < 3) {
            // uncomment the line below to make it finish
            //System.out.println(list.size());
        }

        System.out.println("Done");
    }
}

I don't seem to have any idea.

T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
pavel_orekhov
  • 1,657
  • 2
  • 15
  • 37

2 Answers2

2

The compiler is free to optimize

    while (list.size() < 3) {
        // uncomment the line below to make it finish
        //System.out.println(list.size());
    }

to

    if (list.size() < 3) {
        while(1);
    }

Why? Because this just looks like the CPU is reading the list's size from cache, and CPUs are allowed to have caches and read from them.

Note that it's not that the CPU actually has a cache or that anything is actually read from a cache. It's just that CPUs are allowed to have caches and read from them and your code has to take that into account, which you failed to do.

Java provides any number of synchronization primitives that you can use to solve this problem.

This may seem like a weird decision for a language to make. But cross-thread synchronization is only needed in a limited number of cases and the number of optimizations allowed by assuming that cross-thread synchronization is not needed is vast and significant. So the onus is put on code that requires synchronization to request it.

When you hear Java's memory semantics described in terms of flushing caches to main memory, understand that that's an abstraction. The system behaves as if it had caches that needed to be flushed. Synchronization functions behave as if they flushed data to, or read data from, main memory. But what's actually happening under the hood on most systems is entirely different because CPU cache coherence is guaranteed in hardware on most realistic multi-core CPUs you're likely to run Java code on.

It's somewhat amusing, but this mythical caching and flushing allows code to be more efficient even though it does not actually exist in realistic modern hardware! The fact that programmer's are required to act as if such caching existed and such flushing was required permits optimizations to be made in code generation that result in significant performance improvements.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • From the compiler perspective "list.size()" is a function call. Compiler can optimize function calls in the way described above ONLY IF they don't contain thread synchronisation code, which would not be obvious only from the code snippet you put... So would suggest adjusting "free to optimize" to "free to optimize, if function does not contain thread synchronisation" – vladmihaisima Jan 10 '19 at 14:44
  • @vladmihaisima I don't like to say things like that. The compiler is free to optimize them even if they do contain thread synchronization, it just has to optimize them in ways that don't break the synchronization. In fact, I was debating adding another paragraph that says almost the opposite of what you suggest. – David Schwartz Jan 10 '19 at 16:11
  • 1
    AFAIK, this optimization is not actually applied like this. Do you have a reference that supports your claim? – Mark Rotteveel Jan 10 '19 at 16:35
  • @MarkRotteveel Read my answer very carefully. I didn't say it was actually applied like that, I said the compiler was "free to optimize" not that it actually does. This difference is critical. – David Schwartz Jan 10 '19 at 19:01
  • _"The compiler is free to optimize ..."_ implies that it does, otherwise the rest of your answer doesn't make much sense to me: if the compiler (or jit or JVM) doesn't work like that, then caching behaviour would be a very probable cause (but you say CPU caches don't work like that). – Mark Rotteveel Jan 11 '19 at 09:56
  • @MarkRotteveel I explained why it can't be actual CPU caching behavior. Things don't stay in CPU caches for several seconds when a CPU is used by a modern OS that switches tasks all the time. (And, also, if you understand how modern CPUs work, their caches don't work like that at all. Modern CPUs have to have CPU cache coherency hardware because their memory is too slow to synchronize through without unacceptable performance penalties.) – David Schwartz Jan 11 '19 at 19:07
  • @DavidSchwartz: this is not about CPU caches. Example: assume list.size() has inside a function that updates a log with a timestamp (perfectly reasonable). The compiler, in that case, can't move the function outside the loop (so, "compiler is not free to optimize") because it would affect described behaviour (many messages in log). So, that's why I suggest to improve the answer by mentioning the compiler can move list.size() only if it is not doing anything "else" (with "else" = anything with side effects outside the thread, which would imply synchronisation) – vladmihaisima Jan 12 '19 at 19:06
  • @vladmihaisima But it can move it, even if it has side effects outside that thread by moving it and keeping the side effects the same, if it had some way to do that. Can you be absolutely sure that no future Java implementation can possibly figure out some way to do that? The whole point of my answer is to not give answers based on what future compilers might or might not be able to do. You are suggesting that I totally undermine my entire point! – David Schwartz Jan 12 '19 at 22:03
  • @DavidSchwartz: it seems you agree that "the compiler is free to move it, if it can keep the side effects the same". This is not specified at all in the current answer. Now it reads "the compiler is free to move it because this just looks like the CPU is reading from cache". Which is not correct unless the method size is defined as "int list() { return r; }", which would be hard to verify for all versions of the library out there, so I would not rely on. – vladmihaisima Jan 14 '19 at 09:19
  • @vladmihaisima The point is not that you rely on the compiler being free to move it. The point is that you understand that the compiler may be free to move it. You don't really have to care what the implementation actually does, only understand what it *might* do and ensure that nothing that it might possibly do can cause your code to break. – David Schwartz Jan 14 '19 at 09:26
  • @DavidSchwartz: the answer is "The compiler is free to optimize to . Why? Because this just looks like the CPU is reading from cache.". This is confusing because list.size() is not a read from a cache but a method call. The compiler can inspect the list.size, determine it is only a read (depends on the implementation) and only then it is free to optimize. This is about making an answer clear, complete and non-confusing for beginners. Considering you put the effort to explain the rest, the first would benefit for explaining where is the read from cache. I will not comment more... – vladmihaisima Jan 14 '19 at 11:16
0

According to: https://docs.oracle.com/javase/7/docs/api/java/util/ArrayList.html ArrayList are not thread-safe, to quote:

Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.synchronizedList method.

To fix use:

static List<String> list = java.util.Collections.synchronizedList(new ArrayList());
vladmihaisima
  • 2,119
  • 16
  • 20