2

I want to parse multiple files to extract the required data and then write the output into an XML file. I have used Callable Interface to implement this. My colleague asked me to use Java 8 feature which does this job easily. I am really confused which one of them I should use now.

list.parallelStream().forEach(a -> {
            System.out.println(a);
        });
Raghav2580
  • 256
  • 2
  • 10
  • Can you post the code for your `Callable`? – akhil_mittal May 08 '15 at 08:37
  • 4
    Using Parallel Stream without knowing pros and cons of it does not make sense, IMO. – akhil_mittal May 08 '15 at 08:38
  • looks like the resulting output ordering of your XML could get a bit undeterministic with parallel processing. – Thilo May 08 '15 at 08:39
  • Printing list items is the only action you perform? – Alex Salauyou May 08 '15 at 08:40
  • 2
    So you'd rather ask on SO than your colleague? :-) – Amos M. Carpenter May 08 '15 at 08:41
  • Doesn't look like there's anything to run in parallel in that example. – ChiefTwoPencils May 08 '15 at 08:41
  • 3
    Parallel Streams in Java use `ForkJoinPool` internally which may not be the best fit all the time. So in your case it may or may not perform better. http://zeroturnaround.com/rebellabs/java-parallel-streams-are-bad-for-your-health/ Why don't you benchmark your results. – akhil_mittal May 08 '15 at 08:42
  • BTW, it may be just an example but you could save some bits and use `...forEach(System.out::println);` above. – ChiefTwoPencils May 08 '15 at 08:52
  • 1
    btw Parallel Stream might be slower(as compared to Callable) if the dataset is small because of the overhead in splitting the work among multiple threads and joining or merging the result http://stackoverflow.com/questions/20375176/should-i-always-use-a-parallel-stream-when-possible – Nitin Dandriyal May 08 '15 at 08:52

1 Answers1

4

Using concurrency or a parallel stream only helps if you have independent tasks to work on. A good example of when you wouldn't do this is what you are locking on a shared resources e.g.

// makes no sense to use parallel here.
list.parallelStream().forEach(a -> {
        // locks System.out so only one thread at a time can do any work.
        System.out.println(a);
    });

However, as a general question, I would use parallelStream for processing data, instead of the concurrency libraries directly because;

  • a functional style of coding discourages shared mutable state. (Actually how are not supposed to have an mutable state in functional programming but Java is not really a functional language)
  • it's easier to write and understand for processing data.
  • it's easier to test whether using parallel helps or not. Most likely ti won't and you can just as easily change it back to being serial.

IMHO Given the chances that using parallel coding will really help is low, the best feature of parallelStream is not how simple it is to add, but how simple it is to take out.

The concurrency library is better if you have ad hoc work which is difficult to model as a stream of data. e.g. a worker pool for client requests might be simplier to implement using an ExecutorService.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130