0

I was doing some performance evaluation of Java aggeragate operations to iterate over collections. I was doing an evaluation of performance stream and parallelStream. But I found that the output of parallelStream is wrong most of the times. For example in the following code i got wrong output from parallelStream more than 80% of the time:

    public class ParallelStreamPerformance {
    static int totaleven = 0;
    public static void main(String[] args) {
        List<Integer> randomList = new ArrayList<>();
        Random rnd = new Random();
        for(int i = 0 ;i < 1000;i++) {
            int r = rnd.nextInt(500000);
            randomList.add(r);
        }

        long s1 = System.currentTimeMillis();

        randomList.stream().filter(e -> e%2 ==0).forEach(e -> count());
        System.out.println("Even: "+totaleven);
        long e1 = System.currentTimeMillis();
        System.out.println(e1 - s1);

        totaleven = 0;
        long s2 = System.currentTimeMillis();

        randomList.parallelStream().filter(e -> e%2 ==0).forEach(e -> count());
        System.out.println("Even: "+totaleven);
        long e2 = System.currentTimeMillis();
        System.out.println(e2 - s2);
    }
    public static void count() {
        totaleven++;
    }
}

My question is: Am I using parallelStream in a wrong way? Is there any way to ensure the correctness of parallelStream. Thanks

Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
Md Johirul Islam
  • 5,042
  • 4
  • 23
  • 56
  • Your benchmarking methodology is worse than worthless; it will give you answers, but they'll be wrong. See https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java – Brian Goetz Dec 09 '17 at 15:04
  • Thanks @BrianGoetz – Md Johirul Islam Dec 09 '17 at 15:59
  • When your actual question is about getting a wrong result, you shouldn’t clutter it with irrelevant benchmark stuff. Your code could have be shortened to ten lines and the prose about you doing a “performance evaluation” is obsolete as well. When your code doesn’t work correctly, you don’t need to measure its performance. – Holger Dec 11 '17 at 07:45

3 Answers3

4

I think your code having issue with count() method. As parallelStream will try to perform task concurrently. This method should be synchronized or you can make totaleven as Atomtic Integer. Hope it helps.

sayboras
  • 4,897
  • 2
  • 22
  • 40
  • Yes using synchronized method solve the problem. Thanks – Md Johirul Islam Dec 08 '17 at 17:03
  • 1
    So now your code is _correct_, but it is unlikely that the parallel code will be faster than the sequential (which is the only reason to use parallelism), since the threads will spend all their time waiting for a lock. What you want to do is use `count()` or `reduce()` rather than incrementing a shared mutable counter. This is easier to get right (no synchronization needed) and more parallel-friendly. – Brian Goetz Dec 09 '17 at 15:02
  • @BrianGoetz: perhaps it was a design mistake to add `forEach` to the Stream API… – Holger Dec 11 '17 at 07:47
  • @Holger We actually did discuss that, since the only purpose of any stream operation whose behavior parameters don't return anything is to do side-effects. But realistically, we would have (and should have) been laughed out of the room had we done so; that would have just been hiding our heads in the sand. (And then people would have just coopted some other terminal operation and done their side-effects inside, say, a `map()` call, and collected into an empty sink to simulate `forEach()`.) For better or worse, side-effects are part of Java. At best we can discourage them. – Brian Goetz Dec 11 '17 at 16:22
  • @BrianGoetz: perhaps, a not-so-easy-to-find name had helped, e.g. see how much the existence of `forEachOrdered` gets ignored… – Holger Dec 11 '17 at 16:42
  • @Holger It's easy to say "you should have done", and certainly any given complaint can be addressed by a "you should have done" argument, but there's a missing step to this: imagine what other complaints, that are not made now, would have been made in the world where we'd done that. In this case, I think the complaints about "why is such an important operation given such a non-obvious name" would have been overwhelming. – Brian Goetz Dec 11 '17 at 17:06
3

Instead of using the forEach to increment a counter you can use the terminal operation Stream:count

For example

totaleven = randomList.stream().filter(e -> e % 2 ==0).count();
totaleven = 0;
totaleven = randomList.parallelStream().filter(e -> e % 2 ==0).count();

totaleven would need to be changed to data type long or casting applied.

Bluurr
  • 457
  • 5
  • 6
2

What's wrong with parallel stream result? If it is too small then most likely you have problem with totaleven++ as it is not thread safe. Use AtomicInteger or any other thread safe solution.

Julian Rubin
  • 1,175
  • 1
  • 11
  • 23