7

Brian Goetz's wrote a nice article on fork-join at http://www.ibm.com/developerworks/java/library/j-jtp03048.html. In it, he lists a merge sort algorithm using the fork-join mechanism, in which he performs the sort on two sides of an array in parallel, then merges the result.

The algorithm sorts on two different sections of the same array simultaneously. Why isn't an AtomicIntegerArray or some other mechanism necessary to maintain visibility? What guarantee is there that one thread will see the writes done by the other, or is this a subtly bug? As a follow up, does Scala's ForkJoinScheduler also make this guarantee?

Thanks!

Joshua Hartman
  • 1,216
  • 1
  • 11
  • 18
  • They're operating on different sections of the array. There's no contention until it comes to merging. – Anon. Jan 26 '11 at 00:58
  • 3
    I agree that they're operating on different sections. But the semantics of Java's memory model more or less say that not all threads are guaranteed to see all writes (unless the variable is volatile). According to this blog: http://jeremymanson.blogspot.com/2009/06/volatile-arrays-in-java.html even using a volatile int[] isn't enough to guarantee that other threads see your writes to the array – Joshua Hartman Jan 26 '11 at 01:06

2 Answers2

9

The join (of ForkJoin) itself requires a synchronization point, thats the most important piece of information. A synchronization point will ensure that all writes that happen are visible after said point.

If you take a look at the code you can see where the synchronization point occurs. This is just one method call invokeAll

public static void invokeAll(ForkJoinTask<?> t1, ForkJoinTask<?> t2) {
    t2.fork();
    t1.invoke();
    t2.join();
}

Here t2 forks into another process, t1 executes its task and that calling thread will wait on t2.join(). When passing t2. All writes to t1 and t2 will then be visible.

Edit: This edit is just to give a little more of an explanation of what I meant by synchronization point.

Lets say that you have two variables

int x;
volatile int y;

Any time you write to y all writes that happened before you read y will be available. For example

public void doWork(){
   x = 10;
   y = 5;
}

If another thread reads y = 5 that thread is guaranteed to read x = 10. This is because the write to y creates a synchronization point in which all writes before said point will be visible after the write.

With the Fork Join pool the join of a ForkJoinTask will create a synchronization point. Now if t2.fork() and t1.invoke() the joining of t2 will ensure that all writes that previously happened will be seen. Since all the previous writes are within the same structure it will be safe for visibility.

I would be happy to explain further if that isnt as clear.

John Vint
  • 39,695
  • 7
  • 78
  • 108
  • Is `invokeAll` called by `coInvoke`? – Daniel C. Sobral Jan 26 '11 at 10:56
  • And, just to complement the answer, see also http://java.sun.com/docs/books/jls/third_edition/html/memory.html#64058. – Daniel C. Sobral Jan 26 '11 at 11:13
  • @Danciel C. Sobral interesting you ask, when I was writing this example I was actually looking for the coInvoke method. It seems that the coInvoke method itself has been purged from the source code. That is at least I cannot find it any longer. – John Vint Jan 26 '11 at 14:32
  • According to the document Daniel posted: "All actions in a thread happen-before any other thread successfully returns from a join() on that thread." I started reading the source code myself - it's very, very tricky stuff. It looks like the join guarantees the visibility though. I'd love to have some more details on this, but I'll mark this answer correct. – Joshua Hartman Jan 26 '11 at 21:55
  • @Joshua Hartman feel free to look at my edit for more of an explanation – John Vint Jan 27 '11 at 00:21
1

Just a guess: merge includes joining on a Thread, and the join guarantees the visibility.

The second part is sure; I don't know how is merge implemented.

maaartinus
  • 44,714
  • 32
  • 161
  • 320