0

I'm trying to leverage the Collectors.Stream() library to do various data aggregation and manipulation. Right now my data set can range anywhere between a couple thousand records to a couple of million.

Let's say that we have the below POJO class:

public class Item{
   String name;
   Double quantity;
   Double price;
   Double totalDollarAmount;

    public Item(String name, Double quantity, Double price) {
        this.name = name;
        this.quantity= quantity;
        this.price = price;
    }

   //Basic Getters and setters

   public Double getTotalDollarAmount(){
      return getQuantity()*getPrice();
   }
}

From a List<Item> I want to be able to quickly calculate how many of each item I've bought, the average price, and the total money spent for that item(s). Let's say for this scenario I have the following list:

        List<Item> itemsOnly = Arrays.asList(
                new Item("apple", 10.0, 9.99),
                new Item("banana", 20.0, 19.99),
                new Item("orange", 10.0, 29.99),
                new Item("watermelon", 10.0, 29.99),
                new Item("papaya", 20.0, 9.99),
                new Item("apple", 100.0, 9.99),
                new Item("apple", 20.0, 9.99)
        );

If I want to get the total quantity, average price, and total dollar amount of each unique item in that list I can do this:

System.out.println("Total Quantity for each Item: " + itemsOnly.stream().collect(
                Collectors.groupingBy(Item::getName, Collectors.summingDouble(Item::getQuantity))));
System.out.println("Average Price for each Item: " + itemsOnly.stream().collect(
                Collectors.groupingBy(Item::getName, Collectors.averagingDouble(Item::getPrice))));
System.out.println("Total Dollar Amount for each Item: " + itemsOnly.stream().collect(
                Collectors.groupingBy(Item::getName, Collectors.summingDouble(Item::getTotalDollarAmount))));

This would return the following:

Total Quantity for each Item: {papaya=20.0, orange=10.0, banana=20.0, apple=130.0, watermelon=10.0}
Average Price for each Item: {papaya=9.99, orange=29.99, banana=19.99, apple=9.99, watermelon=29.99}
Total Dollar Amount for each Item: {papaya=199.8, orange=299.9, banana=399.79999999999995, apple=1298.7, watermelon=299.9}

Now, what I want to do is store each of those values into a new Item object.

In the above example I would have a new object that would have the name set to "apple", the quantity = 130.0, price = 9.99, and total dollar amount = 1298.7.

I'd like to be able to create this new Item without doing a loop through a list of item names that I want and calling a getter on three different maps (quantity, average price, total amount). I'm not sure if this is possible, but ideally I'd be able to get a map where the key is the name of the item and the value is a fully defined class of Item, like Map<String,Item>.

Is there any way to do this using Collectors stream? Is there a better way to do fast aggregation over a large data set in Java?

Naman
  • 27,789
  • 26
  • 218
  • 353
Porter
  • 143
  • 2
  • 11

2 Answers2

2

You're almost there. To merge your grouped items in a single one, you could use the reducing collector.

Here is a way to do it:

First, define a way to merge two Items:

public static Item merge (Item i1, Item i2) {
    final double count = i1.quantity + i2.quantity;
    final double avgPrice = (i1.quantity * i1.price + i2.quantity * i2.price) / count;
    return new Item(i1.name, count, avgPrice);
}

Then, use it for the downstream collector of the grouping operation. Here's the complete Main with the reducer:

import java.util.Map;
import java.util.List;
import java.util.Arrays;
import java.util.stream.Collectors;
import java.util.Optional;

public class Main
{
    public static void main(String[] args) {        
        List<Item> itemsOnly = Arrays.asList(
                new Item("apple", 10.0, 9.99),
                new Item("banana", 20.0, 19.99),
                new Item("orange", 10.0, 29.99),
                new Item("watermelon", 10.0, 29.99),
                new Item("papaya", 20.0, 9.99),
                new Item("apple", 100.0, 9.99),
                new Item("apple", 20.0, 9.99)
        );
        
        Map<String, Item> groupedItems = itemsOnly.stream().collect(
                Collectors.groupingBy(
                     item -> item.name,
                     Collectors.collectingAndThen(
                         Collectors.<Item>reducing(Main::merge),
                         Optional::get // No need for null check: grouping should send at least one element to the reducer
                    )
                )
        );

        for (Item i : groupedItems.values()) System.out.println(i);                 
    }
    
    public static Item merge (Item i1, Item i2) {
        final double count = i1.quantity + i2.quantity;
        final double avgPrice = (i1.quantity * i1.price + i2.quantity * i2.price) / count;
        return new Item(i1.name, count, avgPrice);
    }
    
    public static class Item {
        public final String name;
        public final double quantity;
        public final double price;

        public Item(String name, double quantity, double price) {
            this.name = name;
            this.quantity= quantity;
            this.price = price;
        }

        public double getTotalDollarAmount(){
          return quantity*price;
        }
        
        public String toString() { return String.format("%s: quantity: %d, price: %f, total: %f", name, (int) quantity, price, getTotalDollarAmount()); }
    }
}
 

EDIT

As @Naman said in comments, a simpler alternative to groupingBy + reducing would be to use toMap collector. The stream call would then look like:

Map<String, Item> groupedItems = itemsOnly.stream().collect(
            Collectors.toMap(
                item -> item.name,
                Function.identity(),
                Main::merge
            )
);

In general, my advice would be to read carefully the official apidoc of collectors and other stream operations, because each one has different computation properties (some can be run in parallel, others don't, you may need to give pure functions in some cases, etc.). To choose the better one to a use-case can be tricky, as you can see with my answer.

amanin
  • 3,436
  • 13
  • 17
  • Its better to [replace `groupingBy` + `reducing` with `toMap`](https://stackoverflow.com/questions/57041896/java-streams-replacing-groupingby-and-reducing-by-tomap). – Naman Feb 04 '21 at 01:50
  • @Naman Thanks for the tip, I've updated my answer accordingly. – amanin Feb 04 '21 at 07:21
0

You could implement a class ItemStats which will take care of collecting all relevant statistics and collect using Collectors.toMap:

class ItemStats extends Item {
    private int count;
    
    public ItemStats(Item item) {
        super(item.getName(), item.getQuantity(), item.getPrice());
        this.totalDollarAmount = item.getTotalDollarAmount();
        this.count = 1;
    }
    
    public ItemStats merge(Item item) {
        this.quantity += item.getQuantity();
        this.price += item.getPrice();
        this.totalDollarAmount += item.getTotalDollarAmount();
        this.count++;
        
        return this;
    }
    
    public Double getAveragePrice() {
        return this.price / this.count;
    }
}

// test class
Map<String, ItemStats> stats = itemsOnly
        .stream()
        .collect(Collectors.toMap(
            Item::getName, 
            ItemStats::new, 
            ItemStats::merge, 
            LinkedHashMap::new
        ));
stats.forEach((k, v) -> System.out.printf("%s: total quantity=%.0f avg.price=%.2f total amount=$%.2f%n", 
        k, v.getQuantity(), v.getAveragePrice(), v.getTotalDollarAmount()));

Output:

apple: total quantity=130 avg.price=9.99 total amount=$3896.10
banana: total quantity=20 avg.price=19.99 total amount=$399.80
orange: total quantity=10 avg.price=29.99 total amount=$299.90
watermelon: total quantity=10 avg.price=29.99 total amount=$299.90
papaya: total quantity=20 avg.price=9.99 total amount=$199.80
Nowhere Man
  • 19,170
  • 9
  • 17
  • 42
  • you don't really need an additional class per se, the merge method shall be sufficient as described in the other [existing answer](https://stackoverflow.com/a/66036027/1746118) – Naman Feb 04 '21 at 01:51