15

Run across this very interesting but one year old presentation by Brian Goetz - in the slide linked he presents an aggregateBy() method supposedly in the Stream API, which is supposed to aggregate the elements of a list (?) to a map (given a default initial value and a method manipulating the value (for duplicate keys also) - see next slide in the presentation).

Apparently there is no such method in the Stream API. Is there another method that does something analogous in Java 8 ?

Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
  • By the way, if you are into this kind of stuff, Brian Goetz has also made [this presentation](http://www.youtube.com/watch?v=C_QbkGU_lqY); very, very instructive. – fge May 24 '14 at 18:13

3 Answers3

15

The aggregate operation can be done using the Collectors class. So in the video, the example would be equivalent to :

Map<String, Integer> map = 
    documents.stream().collect(Collectors.groupingBy(Document::getAuthor, Collectors.summingInt(Document::getPageCount)));

The groupingBy method will give you a Map<String, List<Document>>. Now you have to use a downstream collector to sum all the page count for each document in the List associated with each key.

This is done by providing a downstream collector to groupingBy, which is summingInt, resulting in a Map<String, Integer>.


They give basically the same example in the documentation where they compute the sum of the employees' salary by department.

I think that they removed this operation and created the Collectors class instead to have a useful class that contains a lot of reductions that you will use commonly.

Alexis C.
  • 91,686
  • 21
  • 171
  • 177
  • In fact, `Collectors` does not help only with aggregation; you also have such basic things in it as `Collectors.toList()` etc; in the long run it can become quite confusing that all of these operations are in the same "function bag". I guess the JDK could do with an `Aggregates` class... – fge May 24 '14 at 19:03
  • @fge Maybe Brian Goetz will see this thread and explain the rational, that would be interesting. – Alexis C. May 24 '14 at 19:10
  • 8
    Yes, aggregateBy() was an early stab at the concepts now embodied in Collector. (Though I think the example cited was well more than a year ago.) One problem with aggregateBy(), which was addressed with Collector, was that it only went "one level deep." Collectors.groupingBy() addresses this with "downstream" collectors and is far more composible. – Brian Goetz May 24 '14 at 19:11
  • @BrianGoetz OK, now that I see the `Collector` interface it makes sense that all of these are in `Collectors`, but my word it takes time to grasp the concept behind this interface! I guess an explanation of the `Collector` interface is the real answer to this question – fge May 24 '14 at 19:15
  • 1
    @fge That will be a longer explanation! But stay with it, its worth it. It's very powerful once you grasp the concepts. – Brian Goetz May 24 '14 at 19:19
  • @BrianGoetz side question... When will we have JCIP v2? ;) – fge May 24 '14 at 20:20
3

Let's say we have a list of employees with their department and salary and we want the total salary paid by each department.

There are several ways to do it and you could for example use a toMap collector to aggregate the data per department:

  • the first argument is the key mapper (your aggregation axis = the department),
  • the second is the value mapper (the data you want to aggregate = salaries), and
  • the third is the merging function (how you want to aggregate data = sum the values).

Example:

import static java.util.stream.Collectors.*;

public static void main(String[] args) {
  List<Person> persons = Arrays.asList(new Person("John", "Sales", 10000),
                                       new Person("Helena", "Sales", 10000),
                                       new Person("Somebody", "Marketing", 15000));

  Map<String, Double> salaryByDepartment = persons.stream()
          .collect(toMap(Person::department, Person::salary, (s1, s2) -> s1 + s2));
  System.out.println("salary by department = " + salaryByDepartment);
}

As often with streams, there are several ways to get the desired result, for example:

import static java.util.stream.Collectors.*;

Map<String, Double> salaryByDepartment = persons.stream()
        .collect(groupingBy(Person::department, summingDouble(Person::salary)));

For reference, the Person class:

static class Person {
  private final String name, department;
  private final double salary;
  public Person(String name, String department, double salary) {
    this.name = name;
    this.department = department;
    this.salary = salary;
  }
  public String name() { return name; }
  public String department() { return department; }
  public double salary() { return salary; }
}
fge
  • 119,121
  • 33
  • 254
  • 329
assylias
  • 321,522
  • 82
  • 660
  • 783
1

This particular Javadoc entry is about the closest thing I could find on this piece of aggregation in Java 8. Even though it's a third party API, the signatures seem to line up pretty well - you provide some function to get values from, some terminal function for values (zero, in this case), and some function to combine the function and the values together.

It feels a lot like a Collector, which would offer us the ability to do this.

Map<String, Integer> strIntMap =
    intList.stream()
           .collect(Collectors
               .groupingBy(Document::getAuthor,
                Collectors.summingInt(Document::getPageCount)));

The idea then is that we group on the author's name for each entry in our list, and add up the total page numbers that the author has into a Map<String, Integer>.

Makoto
  • 104,088
  • 27
  • 192
  • 230