Java - cache coherence between successive parallel streams?

Question

Consider the following piece of code (which isn't quite what it seems at first glance).

static class NumberContainer {

    int value = 0;

    void increment() {
        value++;
    }

    int getValue() {
        return value;
    }
}

public static void main(String[] args) {

    List<NumberContainer> list = new ArrayList<>();
    int numElements = 100000;
    for (int i = 0; i < numElements; i++) {
        list.add(new NumberContainer());
    }

    int numIterations = 10000;
    for (int j = 0; j < numIterations; j++) {
        list.parallelStream().forEach(NumberContainer::increment);
    }

    list.forEach(container -> {
        if (container.getValue() != numIterations) {
            System.out.println("Problem!!!");
        }
    });
}

My question is: In order to be absolutely certain that "Problem!!!" won't be printed, does the "value" variable in the NumberContainer class need to be marked volatile?

Let me explain how I currently understand this.

In the first parallel stream, NumberContainer-123 (say) is incremented by ForkJoinWorker-1 (say). So ForkJoinWorker-1 will have an up-to-date cache of NumberContainer-123.value, which is 1. (Other fork-join workers, however, will have out-of-date caches of NumberContainer-123.value - they will store the value 0. At some point, these other workers' caches will be updated, but this doesn't happen straight away.)
The first parallel stream finishes, but the common fork-join pool worker threads aren't killed. The second parallel stream then starts, using the very same common fork-join pool worker threads.
Suppose, now, that in the second parallel stream, the task of incrementing NumberContainer-123 is assigned to ForkJoinWorker-2 (say). ForkJoinWorker-2 will have its own cached value of NumberContainer-123.value. If a long period of time has elapsed between the first and second increments of NumberContainer-123, then presumably ForkJoinWorker-2's cache of NumberContainer-123.value will be up-to-date, i.e. the value 1 will be stored, and everything is good. But what if the time elapsed between first and second increments if NumberContainer-123 is extremely short? Then perhaps ForkJoinWorker-2's cache of NumberContainer-123.value might be out of date, storing the value 0, causing the code to fail!

Is my description above correct? If so, can anyone please tell me what kind of time delay between the two incrementing operations is required to guarantee cache consistency between the threads? Or if my understanding is wrong, then can someone please tell me what mechanism causes the thread-local caches to be "flushed" in between the first parallel stream and the second parallel stream?

let me get this straight, what *if* it was marked as *volatile*, how would that change things? volatile is about subsequent actions, someone writes to a volatile, someone has to observer that write; this is what establishes a happens-before. So simply adding volatile would have not fixed things, in case they were broken — Eugene, Aug 25 '18 at 18:47
@Eugene If NumberContainer-123.value was marked volatile, that would establish a happens-before relationship between the first parallel stream writing the value 1 and the second parallel stream reading the value. (The fork-join worker that writes in the first parallel stream could be different from the fork-join worker that reads in the second parallel stream. The volatile marker would then ensure that the value read by the second parallel stream is up-to-date as of the write operation done by the first parallel stream. Which would fix things, if the fork-join pool were broken.) — Kenny Wong, Aug 25 '18 at 20:42
@KennyWong not entirely sure I follow you here... what instance or primitive would you mark as volatile? — Eugene, Aug 25 '18 at 20:47
@Eugene I would mark int value (member of NumberContainer) as volatile. — Kenny Wong, Aug 25 '18 at 20:47

alf · Accepted Answer · 2018-08-24T20:28:12.913

It should not need any delay. By the time you're out of ParallelStream's forEach, all the tasks have finished. That establishes a happens-before relation between the increment and the end of forEach. All the forEach calls are ordered by being called from the same thread, and the check, similarly, happens-after all the forEach calls.

int numIterations = 10000;
for (int j = 0; j < numIterations; j++) {
    list.parallelStream().forEach(NumberContainer::increment);
    // here, everything is "flushed", i.e. the ForkJoinTask is finished
}

Back to your question about the threads, the trick here is, the threads are irrelevant. The memory model hinges on the happens-before relation, and the fork-join task ensures happens-before relation between the call to forEach and the operation body, and between the operation body and the return from forEach (even if the returned value is Void)

See also Memory visibility in Fork-join

As @erickson mentions in comments,

If you can't establish correctness through happens-before relationships, no amount of time is "enough." It's not a wall-clock timing issue; you need to apply the Java memory model correctly.

Moreover, thinking about it in terms of "flushing" the memory is wrong, as there are many more things that can affect you. Flushing, for instance, is trivial: I have not checked, but can bet that there's just a memory barrier on the task completion; but you can get wrong data because the compiler decided to optimise non-volatile reads away (the variable is not volatile, and is not changed in this thread, so it's not going to change, so we can allocate it to a register, et voila), reorder the code in any way allowed by the happens-before relation, etc.

Most importantly, all those optimizations can and will change over time, so even if you went to the generated assembly (which may vary depending on the load pattern) and checked all the memory barriers, it does not guarantee that your code will work unless you can prove that your reads happen-after your writes, in which case Java Memory Model is on your side (assuming there's no bug in JVM).

As for the great pain, it's the very goal of ForkJoinTask to make the synchronization trivial, so enjoy. It was (it seems) done by marking the java.util.concurrent.ForkJoinTask#status volatile, but that's an implementation detail you should not care about or rely upon.

The `parallelStream` one; edited a bit to make it more clear — alf, Aug 24 '18 at 17:37
Thanks for this - I appreciate it very much! I'll read your link carefully, but it might take me a bit of time because I'm quite new to this stuff... — Kenny Wong, Aug 24 '18 at 17:47
I was going to answer to drive home this point, "the trick here is, the threads are irrelevant," but I'll just comment here instead: If you can't establish correctness through `happens-before` relationships, *no amount of time* is "enough." It's not a wall-clock timing issue; you need to apply the Java memory model correctly. — erickson, Aug 24 '18 at 18:05
Great - that link made sense. Just to replay this to you in my own words: The forkjointask.join() does the "memory flushing". So do I understand correctly that this memory flushing something that the creators of the fork-join pool took great pains to guarantee? And was this achieved by marking the result of the fork-join task volatile (which is, according to the linked answer, is sufficient, even if the result is a Void and the "real effect" of the fork-join task is the modifications to the values stored in my silly NumberContainers)? — Kenny Wong, Aug 24 '18 at 18:06
@KennyWong updating the answer... note also the brilliant comment by erickson — alf, Aug 24 '18 at 19:27

Java - cache coherence between successive parallel streams?

1 Answers1