Java Streams – How to group by value and find min and max value of each group?

Question

For my example, having car object and found that min and max price value based on model (group by).

List<Car> carsDetails = UserDB.getCarsDetails();
Map<String, DoubleSummaryStatistics> collect4 = carsDetails.stream()
                .collect(Collectors.groupingBy(Car::getMake, Collectors.summarizingDouble(Car::getPrice)));
collect4.entrySet().forEach(e->System.out.println(e.getKey()+" "+e.getValue().getMax()+" "+e.getValue().getMin()));

output :
Lexus 94837.79 17569.59
Subaru 96583.25 8498.41
Chevrolet 99892.59 6861.85

But I couldn't find which car objects have max and min price. How can I do that?

*But i couldn't find which car object have max and min price* -> Do you mean with that the biggest price difference? — Lino, Jul 17 '18 at 09:32
so you have a list of cars and you want to find the max and min of those cars based on price? — Eugene, Jul 17 '18 at 09:35
don't use `DoubleSummaryStatistics` to only get min and max. — Ousmane D., Jul 17 '18 at 09:40
@Eugene. Yes . you are right..based on model to find max and min of those cars — Learn Hadoop, Jul 17 '18 at 10:05

Holger · Accepted Answer · 2018-07-17T10:06:32.833

If you were interested in only one Car per group, you could use, e.g.

Map<String, Car> mostExpensives = carsDetails.stream()
    .collect(Collectors.toMap(Car::getMake, Function.identity(),
        BinaryOperator.maxBy(Comparator.comparing(Car::getPrice))));
mostExpensives.forEach((make,car) -> System.out.println(make+" "+car));

But since you want the most expensive and the cheapest, you need something like this:

Map<String, List<Car>> mostExpensivesAndCheapest = carsDetails.stream()
    .collect(Collectors.toMap(Car::getMake, car -> Arrays.asList(car, car),
        (l1,l2) -> Arrays.asList(
            (l1.get(0).getPrice()>l2.get(0).getPrice()? l2: l1).get(0),
            (l1.get(1).getPrice()<l2.get(1).getPrice()? l2: l1).get(1))));
mostExpensivesAndCheapest.forEach((make,cars) -> System.out.println(make
        +" cheapest: "+cars.get(0)+" most expensive: "+cars.get(1)));

This solution bears a bit of inconvenience due to the fact that there is no generic statistics object equivalent to DoubleSummaryStatistics. If this happens more than once, it’s worth filling the gap with a class like this:

/**
 * Like {@code DoubleSummaryStatistics}, {@code IntSummaryStatistics}, and
 * {@code LongSummaryStatistics}, but for an arbitrary type {@code T}.
 */
public class SummaryStatistics<T> implements Consumer<T> {
    /**
     * Collect to a {@code SummaryStatistics} for natural order.
     */
    public static <T extends Comparable<? super T>> Collector<T,?,SummaryStatistics<T>>
                  statistics() {
        return statistics(Comparator.<T>naturalOrder());
    }
    /**
     * Collect to a {@code SummaryStatistics} using the specified comparator.
     */
    public static <T> Collector<T,?,SummaryStatistics<T>>
                  statistics(Comparator<T> comparator) {
        Objects.requireNonNull(comparator);
        return Collector.of(() -> new SummaryStatistics<>(comparator),
            SummaryStatistics::accept, SummaryStatistics::merge);
    }
    private final Comparator<T> c;
    private T min, max;
    private long count;
    public SummaryStatistics(Comparator<T> comparator) {
        c = Objects.requireNonNull(comparator);
    }

    public void accept(T t) {
        if(count == 0) {
            count = 1;
            min = t;
            max = t;
        }
        else {
            if(c.compare(min, t) > 0) min = t;
            if(c.compare(max, t) < 0) max = t;
            count++;
        }
    }
    public SummaryStatistics<T> merge(SummaryStatistics<T> s) {
        if(s.count > 0) {
            if(count == 0) {
                count = s.count;
                min = s.min;
                max = s.max;
            }
            else {
                if(c.compare(min, s.min) > 0) min = s.min;
                if(c.compare(max, s.max) < 0) max = s.max;
                count += s.count;
            }
        }
        return this;
    }

    public long getCount() {
        return count;
    }

    public T getMin() {
        return min;
    }

    public T getMax() {
        return max;
    }

    @Override
    public String toString() {
        return count == 0? "empty": (count+" elements between "+min+" and "+max);
    }
}

After adding this to your code base, you may use it like

Map<String, SummaryStatistics<Car>> mostExpensives = carsDetails.stream()
    .collect(Collectors.groupingBy(Car::getMake,
        SummaryStatistics.statistics(Comparator.comparing(Car::getPrice))));
mostExpensives.forEach((make,cars) -> System.out.println(make+": "+cars));

If getPrice returns double, it may be more efficient to use Comparator.comparingDouble(Car::getPrice) instead of Comparator.comparing(Car::getPrice).

might be a bit cheery picky.. --> `Comparator.comparingDouble(...)` instead of `Comparator.comparing(..)`. I first learned this from Holger :) — Ousmane D., Jul 17 '18 at 09:47
@Aominè that’s what I would use when I know for sure that `getPrice` returns `double` instead of `Double`. — Holger, Jul 17 '18 at 09:59
@Holger well I can’t see any indication that getPrice returns a type Double or double but I assumed it’s of type double just as you assumed it was type Double. — Ousmane D., Jul 17 '18 at 10:03
@Holger.. Thanks it is working fine as expected. but i couldn't understand the logic in SummaryStatistics class. would be great if you provide some light on this — Learn Hadoop, Jul 17 '18 at 10:20
@LearnHadoop well, as said, the logic is supposed to be similar to the numerical statistics objects like, e.g. [`DoubleSummaryStatistics`](https://docs.oracle.com/javase/8/docs/api/java/util/DoubleSummaryStatistics.html), so you can find some hints in their documentation. Not being numeric, we don’t have sum nor average, so it only maintains `min`, `max`, and `count`. Basically, the collector will call `accept` for every element resp. for every element of a group when combining with `groupingBy`. Only with parallel evaluation, it may call the `merge` method to combine partial results. — Holger, Jul 17 '18 at 10:26
@Holger How about this approach? ` Map collect = cars.stream() .collect(Collectors.groupingBy(Car::getModel, Collectors.summarizingDouble(Car::getPrice)));` — Ravindra Ranwala, Jul 17 '18 at 10:26
@RavindraRanwala that’s already given in the question. The question is about what this approach does not provide. — Holger, Jul 17 '18 at 10:27
@Holger I think the first (`List`-based) solution is way too convoluted to be used in any code that you'd need to maintain. As for the second (very good) solution, I would like to suggest extracting the `min` and `max` `Comparator`-related assignments to a separate method (like `updateMin(T)` and `updateMax(T)` in my solution) - I believe it would make it more readable. — Tomasz Linkowski, Jul 17 '18 at 12:23
@TomaszLinkowski the `List` based solution is mainly for the “can I do this without creating a new class” fraction… — Holger, Jul 17 '18 at 13:58
@Holger I'd suggest you made it very clear in the answer, and even reversed the order of the solutions. Currently, the answer says "you need something like this" and then "If this happens more than once, it’s worth filling the gap with a class like this". I strongly believe the second solution should be used at all times (save for some coding for fun, maybe). The first solution is where the maintenance hell begins (try to unit-test it, for example) ;) — Tomasz Linkowski, Jul 17 '18 at 14:26
@Holger If you're not going to edit the answer, I'd appreciate your letting me know why not, or expressing your permission for me to edit it. — Tomasz Linkowski, Jul 20 '18 at 06:37
@TomaszLinkowski I understand your reasoning, but the last solution is significantly longer than the others, so I keep this order so that all three solutions are immediately visible on average screen sizes without scrolling. I consider the reader to be smart enough to realize why the last one is the maintainable. — Holger, Jul 20 '18 at 08:37
@Holger Well, certainly you have much more faith in the readers than I do ;) I'd think that most insufficiently experienced programmers would assume that the shorter the solution the more maintainable it is. Anyway, thanks for your explanation! — Tomasz Linkowski, Jul 20 '18 at 09:10

rolve · Answer 2 · 2020-03-25T15:16:22.457

Here is a very concise solution. It collects all Cars into a SortedSet and thus works without any additional classes.

Map<String, SortedSet<Car>> grouped = carDetails.stream()
        .collect(groupingBy(Car::getMake, toCollection(
                () -> new TreeSet<>(comparingDouble(Car::getPrice)))));

grouped.forEach((make, cars) -> System.out.println(make
        + " cheapest: " + cars.first()
        + " most expensive: " + cars.last()));

A possible downside is performance, as all Cars are collected, not just the current min and max. But unless the data set is very large, I don't think it will be noticeable.

Tomasz Linkowski · Answer 3 · 2018-07-17T14:16:35.090

1

I would like to propose a solution that (in my opinion) strives for greatest readability (which reduces e.g. the maintenance burden of such code).

It's Collector-based so - as a bonus - it can be used with a parallel Stream. It assumes the objects are non-null.

final class MinMaxFinder<T> {

    private final Comparator<T> comparator;

    MinMaxFinder(Comparator<T> comparator) {
        this.comparator = comparator;
    }

    Collector<T, ?, MinMaxResult<T>> collector() {
        return Collector.of(
                MinMaxAccumulator::new,
                MinMaxAccumulator::add,
                MinMaxAccumulator::combine,
                MinMaxAccumulator::toResult
        );
    }

    private class MinMaxAccumulator {
        T min = null;
        T max = null;

        MinMaxAccumulator() {
        }

        private boolean isEmpty() {
            return min == null;
        }

        void add(T item) {
            if (isEmpty()) {
                min = max = item;
            } else {
                updateMin(item);
                updateMax(item);
            }
        }

        MinMaxAccumulator combine(MinMaxAccumulator otherAcc) {
            if (isEmpty()) {
                return otherAcc;
            }
            if (!otherAcc.isEmpty()) {
                updateMin(otherAcc.min);
                updateMax(otherAcc.max);
            }
            return this;
        }

        private void updateMin(T item) {
            min = BinaryOperator.minBy(comparator).apply(min, item);
        }

        private void updateMax(T item) {
            max = BinaryOperator.maxBy(comparator).apply(max, item);
        }

        MinMaxResult<T> toResult() {
            return new MinMaxResult<>(min, max);
        }
    }
}

The result-holder value-like class:

public class MinMaxResult<T> {
    private final T min;
    private final T max;

    public MinMaxResult(T min, T max) {
        this.min = min;
        this.max = max;
    }

    public T min() {
        return min;
    }

    public T max() {
        return max;
    }
}

Usage:

MinMaxFinder<Car> minMaxFinder = new MinMaxFinder<>(Comparator.comparing(Car::getPrice));
Map<String, MinMaxResult<Car>> minMaxResultMap = carsDetails.stream()
            .collect(Collectors.groupingBy(Car::getMake, minMaxFinder.collector()));

edited Jul 17 '18 at 14:16

answered Jul 17 '18 at 10:21

Tomasz Linkowski

4,386
23
38

2

That’s close to my `SummaryStatistics` approach, though I don’t copy the result into an immutable type afterwards. Note that it is possible to support ordering for such a collector, i.e. to guaranty that in case of a tie, the first encountered min/max element is kept. – Holger Jul 17 '18 at 10:32
2

But keep in mind that when you support arbitrary `Comparator`s, it might be a [`nullsFirst`](https://docs.oracle.com/javase/8/docs/api/java/util/Comparator.html#nullsFirst-java.util.Comparator-) variant, so `min == null` is not a sufficient criteria to assume that the `MinMaxAccumulator` is empty. – Holger Jul 17 '18 at 10:40
@Holger Thanks for your comments! Indeed, it's very close to your `SummaryStatistics` (I didn't notice the edit when I started to write the answer, though). The remark about the ordering is very good - I updated the answer to reflect that (note, however, that ordering will be preserved only during a sequential collection). As to nullability, I pointed out that this solution supports only non-null elements, but I think your approach (with counting the elements) is indeed much better. – Tomasz Linkowski Jul 17 '18 at 11:53
Oh, and the reason I dumped the result to an object of another class is that, after completing the collection, I didn't want to hold on to the (now redundant) `Comparator`. – Tomasz Linkowski Jul 17 '18 at 12:29
3

Yes, I assumed that writing the answer took some time and overlapped with my edit. Regarding the comparator, I could add a getter for the comparator as well, as someone might be interested in the comparator used for determining min and max. Maintaining the order works even for parallel processing as long as the combiner prefers the left argument in case of a tie, as for [ordered streams](https://stackoverflow.com/a/29218074/2711488), the implementation will care to invoke it with the proper arguments (that’s why `toList()` or `joining(…)` work with parallel streams too. – Holger Jul 17 '18 at 13:29
Oh, I didn't realize that ordering can be preserved even during parallel processing; thanks! Indeed, I had no clue how `toList()` or `joining()` are able to work in parallel mode :) As far as I understand, the current revision of my answer (where I simply changed the order of arguments in `updateMax`) satisfies the condition for preserving the ordering. And good point about the `Comparator`, too. – Tomasz Linkowski Jul 17 '18 at 13:40
2

No, no, both, `minBy` and `maxBy`, return the left argument when both arguments are equal, hence, you should keep the `value = function.apply(value, newItem)` pattern, as for both, the previously encountered value should be preferred. The only issue is, that while `minBy` and `maxBy` exhibit this behavior (and everything speaks for assuming it to be intended behavior), the documentation doesn’t say so explicitly, so some developers feel uncomfortable with code relying on this behavior. So your original code was already capable of maintaining the order, you only have to remove the `UNORDERED`. – Holger Jul 17 '18 at 14:07
Hm, I got it completely wrong then :) I haven't read your comment carefully enough, and I somehow assumed it should be the *first* `min` and the *last* `max`. It kind of seemed more useful (e.g. if all items compared as the same, you would get the first and the last). Nevermind, though. I'll correct it, and I'll remove `UNORDERED`, as you suggested. Thanks! – Tomasz Linkowski Jul 17 '18 at 14:15

score 0 · Answer 4 · edited Apr 16 '23 at 02:57

For Car and it's max price :

Map<String, Optional<Car>> groupByMaxPrice =
             carsDetails.stream().collect(
                     Collectors.groupingBy(Car::getMake, Collectors.maxBy(Comparator.comparing(Car::getPrice)));

For Car and it's min price :

Map<String, Optional<Car>> groupByMaxPrice =
             carsDetails.stream().collect(
                     Collectors.groupingBy(Car::getMake, Collectors.maxBy(Comparator.comparing(Car::getPrice)));

Java Streams – How to group by value and find min and max value of each group?

4 Answers4

Linked

Related