3

As per this article there are some serious flaws with Fork-Join architecture in Java. As per my understanding Streams in Java 8 make use of Fork-Join framework internally. We can easily turn a stream into parallel by using parallel() method. But when we submit a long running task to a parallel stream it blocks all the threads in the pool, check this. This kind of behaviour is not acceptable for real world applications.

My question is what are the various considerations that I should take into account before using these constructs in high-performance applications (e.g. equity analysis, stock market ticker etc.)

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
akhil_mittal
  • 23,309
  • 7
  • 96
  • 95
  • 2
    I'm not convinced that the author of that article is entirely unbiased. Look at who he works for, and what they do. I'm not saying that he is wrong or dishonest, but he / his company is offering a product that appears to be competing with Oracle's standard fork-join framework. Read what he says about "serious flaws" with that in mind. – Stephen C Dec 26 '14 at 12:33
  • 1
    @Stephen If you care to address me directly I would give you the lengthy history of where the articles came from. The open-source products I maintain go back 30 years. – edharned Dec 26 '14 at 15:12
  • @edharned I am migrating my project to Java 8 and planning to use Streams. This is a critical application related to stock market that performs lot of computation in reasonable time. But Streams use FJ framework only, so what are the points to take care before using them? – akhil_mittal Dec 27 '14 at 08:26
  • 3
    @edharned - As I said, I'm merely warning people to use their critical faculties when reading your article. (One should always do do that, but there are a lot of people who are inclined to believe *anything* they read. The OP seemed to have read your article and accepted it as an authority ... and it was not clear why.) – Stephen C Dec 27 '14 at 11:10

3 Answers3

2

The considerations are similar to other uses of multiple threads.

  • Only use multiple threads if you know they help. The aim is not to use every core you have, but to have a program which performs to your requirements.
  • Don't forget multi-threading comes with an overhead, and this overhead can exceed the value you get.
  • Multi-threading can experience large outliers. When you test performance you should not only look at throughput (which should be better) but the distribution of your latencies (which is often worse in extreme cases)
  • For low latency, switch between threads as little as possible. If you can do everything in one thread that may be a good option.
  • For low latency, you don't want to play nice, instead you want to minimise jitter by doing things such as pinning busy waiting threads to isolated cores. The more isolated cores you have the less junk cores you have to run things like thread pools.
Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • Peter, Thanks for your reply. I am aware of those considerations but my real concern is about using Streams specially. Streams are a great tool and along with lambda expressions can be really helpful. But they seem to use Fork-Join framework internally and can have serious issues in real world applications. I want to know what are the concerns I need to know before using them in production environment. – akhil_mittal Dec 26 '14 at 09:47
  • 3
    @akhil_mittal don't assume that it's easy and therefor good to put all over the place. Instead assume that unless you have tested that parallel helps, it probably is more trouble than it's worth. – Peter Lawrey Dec 26 '14 at 09:49
  • I got the point that it is not advisable to use parallel streams until and unless we get some real advantages. But even if we use regular streams (sequential), they will use FJ framework only. As per the link I posted it may still be harmful for critical applications doing lot of computational work. – akhil_mittal Dec 27 '14 at 08:21
  • @akhil_mittal can explain how this is a problem. For serial processing it doesn't use addition threads. – Peter Lawrey Dec 27 '14 at 08:46
1

The streams API makes parallelism deceptively simple. As was stated before, whether using a parallel stream speeds up your application needs to be thoroughly analysed and tested in the actual runtime context. My own experience with parallel streams streams suggests the following (and I am sure this list is far from complete):

  • The cost of the operations performed on the elements of the stream versus the cost of the parallelising machinery determines the potential benefit of parallel streams. For example, finding the maximum in an array of doubles is so fast using a tight loop that the streams overhead is never worthwhile. As soon as the operations get more expensive, the balance starts to tip in favour of the parallel streams API - under ideal conditions, say, a multi-core machine dedicated to a single algorithm). I encourage you to experiment.

  • You need to have the time and stamina to learn the intrinsics of the stream API. There are unexpected pitfalls. For example, a Spliterator can be constructed from a regular Iterator in simple statement. Under the hood, the elements produced by the iterator are first collected into an array. Depending on the number of elements produced by the Iterator that approach becomes very or even too resource hungry.

  • While the cited article make it seem that we are completely at the mercy of Oracle, that is not entirely true. You can write your own Spliterator that splits the input into chunks that are specific to your situation rather than relying on the default implementation. Or, you could write your own ThreadFactory (see the method ForkJoinPool.makeCommonPool).

  • You need to be careful not to produce deadlocks. If the tasks executed on the elements of the stream use the ForkJoinPool themselves, a deadlock may occur. You need to learn how to use the ForkJoinPool.ManagedBlocker API and its use (which I find rather the opposite of easy to grasp). Technically you are telling the ForkJoinPool that a thread is blocking which may lead to the creation of additional threads to keep the degree of parallelism intact. The creation of extra threads is not free, of course.

Just my five cents...

  • You may also care to read this other post on SO: http://stackoverflow.com/questions/20375176/should-i-always-use-a-parallel-stream-when-possible/20384377#20384377 – edharned Dec 26 '14 at 15:16
  • Thanks for the reference. I have done extensive performance tests of parallel streams and I can only concur with what's being said there. – Hans-Peter Schmid Dec 26 '14 at 17:00
1

The point (there are actually 17) of the articles is to point out that the F/J Framework is more of a research project than a general-purpose commercial application development framework.

Criticize the object, not the man. Trying to do that is most difficult when the main problem with the framework is that the architect is a professor/scientist not an engineer/commercial developer. The PDF consolidation downloadable from the article goes more into the problem of using research standards rather than engineering standards.

Parallel streams work fine, until you try to scale them. The framework uses pull technology; the request goes into a submission queue, the thread must pull the request out of the submission queue. The Task goes back into the forking thread's deque, other threads must pull the Task out of the deque. This technique doesn't scale well. In a push technology, each Task is scattered to every thread in the system. That works much better in large scale environments.

There are many other problems with scaling as even Paul Sandoz from Oracle pointed out: For instance if you have 32 cores and are doing Stream.of(s1, s2, s3, s4).flatMap(x -> x).reduce(...) then at most you will only use 4 cores. The article points out, with downloadable software, that scaling does not work well and the parquential technique is necessary to avoid stack overflows and OOME.

Use the parallel streams. But beware of the limitations.

edharned
  • 1,884
  • 1
  • 19
  • 20