2

I have recently started using collectingAndThen and found that it is taking a bit long time comparatively to the other coding procedures, which i used for performing the similar tasks.

Here is my code:

        System.out.println("CollectingAndThen");
        Long t = System.currentTimeMillis();
        String personWithMaxAge = persons.stream()
                                        .collect(Collectors.collectingAndThen(
                                                                Collectors.maxBy(Comparator.comparing(Person::getAge)),
                                                                (Optional<Person> p) -> p.isPresent() ? p.get().getName() : "none"
                                                ));


        System.out.println("personWithMaxAge - "+personWithMaxAge + " time taken = "+(System.currentTimeMillis() - t));
        Long t2 = System.currentTimeMillis();
        String personWithMaxAge2 = persons.stream().sorted(Comparator.comparing(Person::getAge).reversed())
                                                    .findFirst().get().name;
        System.out.println("personWithMaxAge2 : "+personWithMaxAge2+ " time taken = "+(System.currentTimeMillis() - t2));

And here it the output:

CollectingAndThen
personWithMaxAge - Peter time taken = 17
personWithMaxAge2 : Peter time taken = 1

which is showing that collectingAndThen is taking a greater time comparatively.

So my question is - Should I keep on collectingAndThen or there is some other suggestions?

Holger
  • 285,553
  • 42
  • 434
  • 765
KayV
  • 12,987
  • 11
  • 98
  • 148
  • it's the way u measure things... – Eugene Dec 14 '17 at 10:02
  • 4
    @Eugene That's not really a duplicate of [How to write a micro benchmark](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java), although applying the recommendations in the answers may help. – assylias Dec 14 '17 at 10:06
  • 1
    Why not just `persons.stream().max(Comparator.comparing(Person::getAge)).get().name`? – Andy Turner Dec 14 '17 at 10:09
  • @AndyTurner I just wanted to post that :D Furthermore, you can replace `g‌​et().name` by `map(p -> p.name).orElse('none')` to avoid NoSuchElementExceptions. – Malte Hartwig Dec 14 '17 at 10:11
  • This discrepancy is almost certainly down to how you have set up the test. What happens if you reverse the order, so you do `personWithMaxAge2` first? – Andy Turner Dec 14 '17 at 10:12
  • @AndyTurner Yea that is what can be done, but still the collectingAndThen method would take even more time... – KayV Dec 14 '17 at 10:13
  • @AndyTurner persons contains only five Person objects, each Persons having int age and String name only... And reversing the order shows the approx same time taken. – KayV Dec 14 '17 at 10:17
  • @assylias agreed, I rushed it probably – Eugene Dec 14 '17 at 10:33
  • +1 for Andy. I do not think the use of collectingAndThen(...) makes much sense here. The expression Andy proposed is much easier and better. I use collectingAndThen(...) to wrap the result into an unmodifiable collection or unmodifiable map. – mmirwaldt Feb 06 '18 at 22:38

3 Answers3

13

collectingAndThen adds a single action that is just performed at the end of the collection.

So

String personWithMaxAge = persons.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.maxBy(Comparator.comparing(Person::getAge)),
        (Optional<Person> p) -> p.isPresent() ? p.get().getName() : "none"
    ));

is not different to

Optional<Person> p = persons.stream()
    .collect(Collectors.maxBy(Comparator.comparing(Person::getAge)));
String personWithMaxAge = p.isPresent() ? p.get().getName() : "none";

The actual advantage of specifying the action within a collector shows when you use the resulting collector as input to another collect, e.g. groupingBy(f1, collectingAndThen(collector, f2)).

Since this is a single trivial action performed exactly once at the end, there is no impact on the performance. Further, it’s unlikely that a sorted based solution is faster than a maxBy operation for any nontrivial input.

You are simply using a broken benchmark violating several of the rules listed in “How do I write a correct micro-benchmark in Java?”. Most notably, you are measuring the initial initialization overhead of the first usage of the Stream framework. Just swapping the order of the two operations will give you an entirely different result.

Still, there is no need to make an operation unnecessarily complicated. As said, the advantage of collectors mirroring existing Stream operations is that they can be combined with other collectors. If no such combining is need, just use the straight-forward code

String personWithMaxAge = persons.stream()
    .max(Comparator.comparing(Person::getAge))
    .map(Person::getName).orElse("none");

which is simpler than the Collector usage and more efficient than the sorted based solution.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • 3
    All of this is correct, I'll just add another reason we added `collectingAndThen`, when surely we could have let you collect the results and then apply the post-function yourself: because by baking this into the `Collector`, the intermediate collection does not escape into the client code and therefore can't be accidentally used. – Brian Goetz Dec 14 '17 at 15:12
5

TL;DR; The way you measure things is probably off.

I created a much more valid performance bench of your test, using JMH, set up as the following (the persons list should be initialized differently to further enhance the confidence in the results) :

public static void main(String[] args) throws RunnerException {
    Options opt = new OptionsBuilder().include(SO.class.getSimpleName()).jvmArgs("-Dfile.encoding=UTF-8").build();
    new Runner(opt).run();
}

public static final class Person {
    public Person(String name, int age) {
        this.name = name;
        this.age = age;
    }
    String name;
    int age;
    public int getAge() {return age;};
    public String getName() {return this.name;}
}

public static final List<Person> persons = IntStream.range(0, 100).mapToObj(i -> new Person("person" + i, RandomUtils.nextInt(0, 50))).collect(Collectors.toList());

@Benchmark
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@Warmup(iterations = 5, time = 4, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 4, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(2)
public float collectingAndThen() {
    String personWithMaxAge = persons.stream()
            .collect(Collectors.collectingAndThen(
                    Collectors.maxBy(Comparator.comparing(Person::getAge)),
                    (Optional<Person> p) -> p.isPresent() ? p.get().getName() : "none"
            ));
    return personWithMaxAge.length() * 1.0f;
}

@Benchmark
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
@Warmup(iterations = 5, time = 4, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 4, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(2)
public float sortFindFirst() {
    String personWithMaxAge2 = persons.stream().sorted(Comparator.comparing(Person::getAge).reversed())
            .findFirst().get().name;
    return personWithMaxAge2.length() * 1.0f;
}

The results are beyond doubt on this test (with 100 persons in the list) :

# Run complete. Total time: 00:02:41

Benchmark                        Mode  Cnt        Score       Error  Units
Benchmark.SO.collectingAndThen  thrpt    8  1412304,072 ± 53963,266  ops/s
Benchmark.SO.sortFindFirst      thrpt    8   331214,270 ±  7966,082  ops/s

The collectingAndThen is 4 times faster.

If you scale down the persons list to 5 people, the figures change completly :

Benchmark                        Mode  Cnt         Score        Error  Units
Benchmark.SO.collectingAndThen  thrpt    8  14529905,529 ± 423196,066  ops/s
Benchmark.SO.sortFindFirst      thrpt    8   7645716,643 ± 538730,614  ops/s

But collectingAndThen is still 2x faster.

I suspect your test is off. There are a number of possible reasons, for example, classloading, JIT compilation and other warmups, ...

As @assylias pointed out in the comments, you should rely on better crafted micro benchmarks for measuring time of such "small" methods, to avoid the previously stated side effects. See : How do I write a correct micro-benchmark in Java?

GPI
  • 9,088
  • 2
  • 31
  • 38
5

No, collectingAndThen is fine from an efficiency perspective.

Consider this code for generating a list of random integers:

List<Integer> list =
    new Random().ints(5).boxed().collect(Collectors.toList());

You can get the max from this list using your two methods:

    list.stream().collect(Collectors.collectingAndThen(
        Collectors.maxBy(Comparator.naturalOrder()),
        (Optional<Integer> n) -> n.orElse(0)));

and

    list.stream().sorted().findFirst().get();

If you just time a single execution of these two methods, you might get a time something like this (ideone):

collectAndThen 2.884806
sortFindFirst  1.898522

Those are times in milliseconds.

But keep iterating, and you find the times change dramatically. After 100 iterations:

collectAndThen 0.00792
sortFindFirst  0.010873

Still times in milliseconds.

So, as previously stated, you're just not benchmarking the two approaches correctly.

Andy Turner
  • 137,514
  • 11
  • 162
  • 243