Is stream.max() always faster than stream.reduce()?

Question

I have this following code snippet

    // List of persons with name and age
    List<Person> persons = new ArrayList<>();       

    // Adding 10,000 objects
    for(int i = 0 ; i < 10000 ; i ++) {
        Person p = new Person();
        p.setName("Person " + i);
        p.setAge(i);
        persons.add(p);
    }

    long time1 = System.nanoTime();
    System.out.println("Time before steram.reduce()" + time1);
    Optional<Person> o1 = Optional<Person> o1 = persons.stream().reduce(BinaryOperator.maxBy(Comparator.comparingInt(p -> p.getAge())));
    long time2 = System.nanoTime();
    System.out.println(o1.get() + "\nTime after stream.reduce() " + time2);
    System.out.println("**** Rough execution time for stream.reduce() : " + (time2 - time1) + " nano secs");


    long time3 = System.nanoTime();
    System.out.println("Time before stream.max() " + time3);
    Optional<Person> o2 = persons.stream().max((p01, p02) -> p01.getAge() - p02.getAge());
    long time4 = System.nanoTime();
    System.out.println(o2.get() + "\nTime after stream.max() " + time4);
    System.out.println("**** Rough execution time for stream.max() : " + (time4 - time3) + " nano secs");

While this might not be the ideal way to figure out execution time, basically what I am trying to do here is find the oldest Person and print out the time it took to find it out using stream.reduce() vs stream.max().

Output

Time before steram.reduce()8834253431112
[ Person 9999, 9999]
Time after stream.reduce() 8834346269743
**** Rough execution time for stream.reduce() : 92838631 nano secs
Time before stream.max() 8834346687875
[ Person 9999, 9999]
Time after stream.max() 8834350117000
**** Rough execution time for stream.max() : 3429125 nano secs

P.S. I have ran this code multiple times changing the order of stream.max() and stream.reduce() and found out that stream.reduce() takes significantly more time to produce the output than stream.max().

So is stream.max() always faster than stream.reduce()? If yes then, when should we use stream.reduce()?

Substract the after time with the before time so the results are more readable — Ted Cassirer, Jun 02 '17 at 09:41
Accurate benchmarks are really difficult to build. Try to change the order of the compared snippets for example (i.e. first max then reduce), you may see some differences. Anyway, max is much more readable IMHO, so I would go for max. And if you notice later that indeed, reduce would definitely improve your whole software performances, go for it. But first profiling, then optimizing ;) — sp00m, Jun 02 '17 at 09:45
`new Date().getTime()` is a cumbersome way to get `System.currentTimeMillis()`. But you should use `System.nanoTime()` for measuring elapsed time, or better read everything in [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) — Holger, Jun 02 '17 at 09:46
What @sp00m said. Also, take a look at [`jmh`](http://openjdk.java.net/projects/code-tools/jmh/) for a more accurate way to create microbenchmarks. — David Conrad, Jun 02 '17 at 09:46
@sp00m I have done swapping the order of these two operations, and found the result to be similar, I should add it in the question. And about the readability bit, yes stream.max() is more readable. — Soumitri Pattnaik, Jun 02 '17 at 09:47
Using `Comparator.comparingInt(Person::getAge())` instead of `(p01, p02) -> p01.getAge() - p02.getAge()` will be even more efficient, by the way. — Holger, Jun 02 '17 at 09:47
@Holger thanks for both the suggestions, let me try those and I ll update the question if the behavior still remains the same. — Soumitri Pattnaik, Jun 02 '17 at 09:49
@SoumitriPattnaik even if you do those changes the results are going to be a "single shot" test, which is usually not what you want. Read the link the Holger provided - twice. — Eugene, Jun 02 '17 at 09:51
Just to make it clear: it is possible that `max` is more efficient than `reduce`, but a single run with a stream of *two elements* is not sufficient to conclude that. Generally, both have the same time complexity, so the difference you might experience, even if reproducible, will become insignificant when you process a stream of thousands of elements… — Holger, Jun 02 '17 at 09:53
@Holger can you explain why `Comparator.comparingInt(Person::getAge())` would be more efficient? — Klitos Kyriacou, Jun 02 '17 at 09:56
@Klitos Kyriacou: you don’t repeat yourself and you are not prone to overflow errors. Further, you are reusing existing code which might be used by others, hence, may have a better runtime optimization state than newly introduced custom code. — Holger, Jun 02 '17 at 10:00
@Holger made some changes to the code and updated the question. Doing it for 10,000 objects now and the difference is staggering. — Soumitri Pattnaik, Jun 02 '17 at 10:08
@SoumitriPattnaik If you just swap the two benchmarks you will see that it is always the first one that is slower. — Didier L, Jun 02 '17 at 10:27

Marco13 · Answer 1 · 2017-06-02T10:25:41.777

The ReferencePipeline implementation of max looks as follows:

public final Optional<P_OUT> max(Comparator<? super P_OUT> comparator) {
    return reduce(BinaryOperator.maxBy(comparator));
}

So any performance difference that you observe is just an artifact of the approach that you use for measuring the performance.

Or, more clearly: The answer is No, it is not "always faster".

Edit: Just for reference, here is a slightly adjusted version of your code. It does run the test for different numbers of elements, repeatedly. This is still not a real, reliable (Micro) Benchmark, but more reliable than running the whole thing only once:

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;

public class MaxReducePerformance
{
    public static void main(String[] args)
    {
        for (int n=500000; n<=5000000; n+=500000)
        {
            List<Person> persons = new ArrayList<>();
            for (int i = 0; i < n; i++)
            {
                Person p = new Person();
                p.setName("Person " + i);
                p.setAge(i);
                persons.add(p);
            }

            System.out.println("For " + n);

            long time1 = System.nanoTime();
            Optional<Person> o1 = persons.stream().reduce((p01, p02) -> 
            {
                if (p01.getAge() < p02.getAge())
                    return p02;
                return p01;
            });
            long time2 = System.nanoTime();
            double d0 = (time2 - time1) / 1e9;
            System.out.println("Reduce: "+d0+" seconds, " + o1);

            long time3 = System.nanoTime();
            Optional<Person> o2 =persons.stream().max(
                (p01, p02) -> p01.getAge() - p02.getAge());
            long time4 = System.nanoTime();
            double d1 = (time4 - time3) / 1e9;
            System.out.println("Max   : "+d1+" seconds, " + o2);
        }

    }
}


class Person
{
    String name;
    int age;

    void setName(String name)
    {
        this.name = name;
    }

    void setAge(int age)
    {
        this.age = age;
    }

    int getAge()
    {
        return age;
    }

}

The output should show that the durations are basically equal.

I had different benchmark results, the first one was actually faster, and the reason behind that is likely that it compares the integers directly, while the second first subtracts and compares to 0 (Or perhaps by the extra indirections caused by using a comparator). This was not optimized away, as was visible in the JIT emitted code. — Jorn Vernee, Jun 02 '17 at 11:06
@JornVernee It cannot be optimized away, because it yields different behaviors. However, I did not analyze the JIT code here - just wanted to point out that there basically is no (measurable) difference - have the differences been significant in your case? — Marco13, Jun 02 '17 at 14:05
The difference was most significant for sorted input, the comparator implementation took about 50% longer, for shuffled input it was about 25%. The main thing I saw in the assembler was that the `compare` method was not inlined. Since it's a field of the `BinaryOperator` it's not seen as constant. — Jorn Vernee, Jun 02 '17 at 15:29
What I mean by optimized away, is that you can write `x - y < 0` as `x < y`, but you can only really do that if the `compare` method is inlined. — Jorn Vernee, Jun 02 '17 at 15:44
@JornVernee: *"...that you can write x - y < 0 as x < y,..."* - This, exactly, is *not* the case. They will behave differently for values where the subtraction causes an overflow. But again, I did not (yet) have a look at the assembly. — Marco13, Jun 02 '17 at 19:00

score 0 · Answer 2 · answered Jun 02 '17 at 09:44

0

Your reduce function evaluates getAge twice in each iteration, that's why the result might be slower depending on the compiler optimizations, restructure your code and check the result.

Also, Stream.max might benefit from built-in VM optimizations so you should always stick with built-in functions instead of implementing equivalent ones.

answered Jun 02 '17 at 09:44

Przemysław Zalewski

3,836
1
23
23

3

The `Stream.max` variant will call `getAge` twice per element, too. – Holger Jun 02 '17 at 09:49
I thought it will be implemented smarter but as I can see in OpenJDK it just calls 'reduce(BinaryOperator.maxBy(comparator))'. So the only thing that comes to my mind now is that comparators might be better optimized than this inline `if` branching. – Przemysław Zalewski Jun 02 '17 at 10:08
3

There is no way to implement it smarter, when all you have, is a `Comparator`. If the method knew that it is actually a “max by property”, it could do it smarter, but the `Comparator` interface isn’t telling… – Holger Jun 02 '17 at 10:33

Is stream.max() always faster than stream.reduce()?

2 Answers2