2

I have an array of some objects with the method process() that I want to run parallelized. And I wanted to try lambdas to achieve the parallelization. So I tried this:

Arrays.asList(myArrayOfItems).forEach(item->{
    System.out.println("processing " + item.getId());
    item.process();
});

Each process() call takes about 2 seconds. And I have noticed that there is still no speedup with the "parallelization" approach. It seems that everything is still running serialized. The ids are printed in series (ordered) and between every print there is a pause of 2 seconds.

Probably I have misunderstood something. What is needed to execute this in parallel using lambdas (hopefully in a very condensed way)?

Azure
  • 399
  • 2
  • 13

3 Answers3

4

Lambdas itself aren't executing anything in parallel. Streams are capable of doing this though.

Take a look at the method Collection#parallelStream (documentation):

Arrays.asList(myArrayOfItems).parallelStream().forEach(...);

However, note that there is no guarantee or control when it will actually go parallel. From its documentation:

Returns a possibly parallel Stream with this collection as its source. It is allowable for this method to return a sequential stream.

The reason is simple. You really need a lot of elements in your collection (like millions) for parallelization to actually pay off (or doing other heavy things). The overhead introduced with parallelization is huge. Because of that, the method might choose to use sequential stream instead, if it thinks that it will be faster.

Before you think about using parallelism, you should actually setup some benchmarks to test if it improves anything. There are many examples where people did just blindly use it without noticing that they actually decreased the perfomance. Also see Should I always use a parallel stream when possible?.


You can check if a Stream is parallel by using Stream#isParallel (documentation).

If you use Stream#parallel (documentation) directly on a stream, you get a parallel version.

Zabuzard
  • 25,064
  • 8
  • 58
  • 82
  • I think this would be the best answer, altough it has one flaw: When each operation takes 2 seconds it really pays off on a much lower amount of ops than millions when a user is looking for a fluid experience. – Azure Mar 16 '18 at 13:16
  • That is correct. Just note that parallelism introduces a huge overhead and you shouldn't blindly use it everywhere without thinking about it or actually measuring times. – Zabuzard Mar 16 '18 at 13:18
1

Method Collection.forEach() is just iteration through all the elements. It is called internal iteration as it leaves up to the collection how it will iterate, but it is still an iteration on all the elements.

If you want parallel processing, you have to:

  1. Get a parallel stream from the collection.
  2. Specify the operation(s) which will be done on the stream.
  3. Do something with the result if you need to.

You may read first part of my explanation here: https://stackoverflow.com/a/22942829/2886891

Honza Zidek
  • 9,204
  • 4
  • 72
  • 118
1

To create a parallel stream, invoke the operation .parallelStream on a Collection

See https://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html

Arrays.asList(myArrayOfItems).parallelStream().forEach(item->{
    System.out.println("processing " + item.getId());
    item.process();
});
Andreas DM
  • 10,685
  • 6
  • 35
  • 62