60

I have a class Foo with these fields:

id:int / name;String / targetCost:BigDecimal / actualCost:BigDecimal

I get an arraylist of objects of this class. e.g.:

new Foo(1, "P1", 300, 400), 
new Foo(2, "P2", 600, 400),
new Foo(3, "P3", 30, 20),
new Foo(3, "P3", 70, 20),
new Foo(1, "P1", 360, 40),
new Foo(4, "P4", 320, 200),
new Foo(4, "P4", 500, 900)

I want to transform these values by creating a sum of "targetCost" and "actualCost" and grouping the "row" e.g.

new Foo(1, "P1", 660, 440),
new Foo(2, "P2", 600, 400),
new Foo(3, "P3", 100, 40),
new Foo(4, "P4", 820, 1100)

What I have written by now:

data.stream()
       .???
       .collect(Collectors.groupingBy(PlannedProjectPOJO::getId));

How can I do that?

Holger
  • 285,553
  • 42
  • 434
  • 765
haisi
  • 1,035
  • 1
  • 10
  • 30

5 Answers5

119

Using Collectors.groupingBy is the right approach but instead of using the single argument version which will create a list of all items for each group you should use the two arg version which takes another Collector which determines how to aggregate the elements of each group.

This is especially smooth when you want to aggregate a single property of the elements or just count the number of elements per group:

  • Counting:

    list.stream()
      .collect(Collectors.groupingBy(foo -> foo.id, Collectors.counting()))
      .forEach((id,count)->System.out.println(id+"\t"+count));
    
  • Summing up one property:

    list.stream()
      .collect(Collectors.groupingBy(foo -> foo.id,
                                        Collectors.summingInt(foo->foo.targetCost)))
      .forEach((id,sumTargetCost)->System.out.println(id+"\t"+sumTargetCost));
    

In your case when you want to aggregate more than one property specifying a custom reduction operation like suggested in this answer is the right approach, however, you can perform the reduction right during the grouping operation so there is no need to collect the entire data into a Map<…,List> before performing the reduction:

(I assume you use a import static java.util.stream.Collectors.*; now…)

list.stream().collect(groupingBy(foo -> foo.id, collectingAndThen(reducing(
  (a,b)-> new Foo(a.id, a.ref, a.targetCost+b.targetCost, a.actualCost+b.actualCost)),
      Optional::get)))
  .forEach((id,foo)->System.out.println(foo));

For completeness, here a solution for a problem beyond the scope of your question: what if you want to GROUP BY multiple columns/properties?

The first thing which jumps into the programmers mind, is to use groupingBy to extract the properties of the stream’s elements and create/return a new key object. But this requires an appropriate holder class for the key properties (and Java has no general purpose Tuple class).

But there is an alternative. By using the three-arg form of groupingBy we can specify a supplier for the actual Map implementation which will determine the key equality. By using a sorted map with a comparator comparing multiple properties we get the desired behavior without the need for an additional class. We only have to take care not to use properties from the key instances our comparator ignored, as they will have just arbitrary values:

list.stream().collect(groupingBy(Function.identity(),
  ()->new TreeMap<>(
    // we are effectively grouping by [id, actualCost]
    Comparator.<Foo,Integer>comparing(foo->foo.id).thenComparing(foo->foo.actualCost)
  ), // and aggregating/ summing targetCost
  Collectors.summingInt(foo->foo.targetCost)))
.forEach((group,targetCostSum) ->
    // take the id and actualCost from the group and actualCost from aggregation
    System.out.println(group.id+"\t"+group.actualCost+"\t"+targetCostSum));
Community
  • 1
  • 1
Holger
  • 285,553
  • 42
  • 434
  • 765
  • 2
    Nice, I actually never used those methods of `Collectors`. That should be the accepted anwser – Dici Oct 13 '14 at 20:00
  • @Holger How to do that in Java 7 please ? – hamza-don Jun 01 '15 at 19:14
  • 2
    @don-kaotic: that’s an entirely different question – Holger Jun 02 '15 at 08:32
  • 1
    @hamza-don I believe by now you know it is not possible in Java 7 – Sayantan Nov 23 '17 at 19:20
  • In my case, I have class nameed XYZ and it has list of elements of type "Foo". I want to groupBy as per above logic and then need to replace it with Foo list. So I have to do something like xyz.setFooList(performAboveOperation(xyz.getFooList())). I have to replace foo list with reduced foo list in xyz. Any suggestion. How do I collect the result in list instead of calling forEach on that stream. – doga Apr 17 '18 at 13:22
  • 1
    @doga I think you should ask a new question, including what you have tried and a backlink to this Q&A if you like, to provide more context. – Holger Apr 17 '18 at 14:39
22

Here is one possible approach :

public class Test {
    private static class Foo {
        public int id, targetCost, actualCost;
        public String ref;

        public Foo(int id, String ref, int targetCost, int actualCost) {
            this.id = id;
            this.targetCost = targetCost;
            this.actualCost = actualCost;
            this.ref = ref;
        }

        @Override
        public String toString() {
            return String.format("Foo(%d,%s,%d,%d)",id,ref,targetCost,actualCost);
        }
    }

    public static void main(String[] args) {
        List<Foo> list = Arrays.asList(
            new Foo(1, "P1", 300, 400), 
            new Foo(2, "P2", 600, 400),
            new Foo(3, "P3", 30, 20),
            new Foo(3, "P3", 70, 20),
            new Foo(1, "P1", 360, 40),
            new Foo(4, "P4", 320, 200),
            new Foo(4, "P4", 500, 900));

        List<Foo> transform = list.stream()
            .collect(Collectors.groupingBy(foo -> foo.id))
            .entrySet().stream()
            .map(e -> e.getValue().stream()
                .reduce((f1,f2) -> new Foo(f1.id,f1.ref,f1.targetCost + f2.targetCost,f1.actualCost + f2.actualCost)))
                .map(f -> f.get())
                .collect(Collectors.toList());
        System.out.println(transform);
    }
}

Output :

[Foo(1,P1,660,440), Foo(2,P2,600,400), Foo(3,P3,100,40), Foo(4,P4,820,1100)]
Dici
  • 25,226
  • 7
  • 41
  • 82
  • If I understand correctly, you need to create a new Foo object on each reduce operation because otherwise, the reduction is not good for parallel operation. This is, however, a waste of resources, as we could modify the foo object in place. What do you think? Could `reduce((f1,f2) -> { f1.targetCost += f2.targetCost; f1.actualCost += f2.actualCost; return f1;})` work? – Sobvan Jul 19 '17 at 06:57
  • 1
    The general rule when using functional style is that functions should be pure, which means without any side-effect. Creating a new reference every time has a small cost, which should be negligible for the vast majority of applications. If you're really concerned about performance, don't use streams as they introduce an overhead compared to a simple loop. – Dici Jul 19 '17 at 10:23
  • Thanks @Dici. After reading a bit more about this topic, I have found that stream().collect() instead of stream().reduce() is I do not want to spawn a new object on each iterateion. This article is quite useful for understaning collect(): https://www.javabrahman.com/java-8/java-8-java-util-stream-collector-basics-tutorial-with-examples/ – Sobvan Jul 19 '17 at 11:46
9
data.stream().collect(toMap(foo -> foo.id,
                       Function.identity(),
                       (a, b) -> new Foo(a.getId(),
                               a.getNum() + b.getNum(),
                               a.getXXX(),
                               a.getYYY()))).values();

just use toMap(), very simple

user1241671
  • 91
  • 1
  • 1
6

Doing this with the JDK's Stream API only isn't really straightforward as other answers have shown. This article explains how you can achieve the SQL semantics of GROUP BY in Java 8 (with standard aggregate functions) and by using jOOλ, a library that extends Stream for these use-cases.

Write:

import static org.jooq.lambda.tuple.Tuple.tuple;

import java.util.List;
import java.util.stream.Collectors;

import org.jooq.lambda.Seq;
import org.jooq.lambda.tuple.Tuple;
// ...

List<Foo> list =

// FROM Foo
Seq.of(
    new Foo(1, "P1", 300, 400),
    new Foo(2, "P2", 600, 400),
    new Foo(3, "P3", 30, 20),
    new Foo(3, "P3", 70, 20),
    new Foo(1, "P1", 360, 40),
    new Foo(4, "P4", 320, 200),
    new Foo(4, "P4", 500, 900))

// GROUP BY f1, f2
.groupBy(
    x -> tuple(x.f1, x.f2),

// SELECT SUM(f3), SUM(f4)
    Tuple.collectors(
        Collectors.summingInt(x -> x.f3),
        Collectors.summingInt(x -> x.f4)
    )
)

// Transform the Map<Tuple2<Integer, String>, Tuple2<Integer, Integer>> type to List<Foo>
.entrySet()
.stream()
.map(e -> new Foo(e.getKey().v1, e.getKey().v2, e.getValue().v1, e.getValue().v2))
.collect(Collectors.toList());

Calling

System.out.println(list);

Will then yield

[Foo [f1=1, f2=P1, f3=660, f4=440],
 Foo [f1=2, f2=P2, f3=600, f4=400], 
 Foo [f1=3, f2=P3, f3=100, f4=40], 
 Foo [f1=4, f2=P4, f3=820, f4=1100]]
Lukas Eder
  • 211,314
  • 129
  • 689
  • 1,509
1
public  <T, K> Collector<T, ?, Map<K, Integer>> groupSummingInt(Function<? super T, ? extends K>  identity, ToIntFunction<? super T> val) {
    return Collectors.groupingBy(identity, Collectors.summingInt(val));
}