17

I have a stream of Foo objects.

class Foo {
    private int variableCount;
    public Foo(int vars) {
        this.variableCount = vars; 
    }
    public Integer getVariableCount() { 
      return variableCount; 
    }
}

I want a list of Foo's that all have the lowest variableCount.

For example

new Foo(3), new Foo(3), new Foo(2), new Foo(1), new Foo(1)

I only want the stream to return the last 2 Foos, since they have the lowest value.

I've tried doing a collect with grouping by

.collect(Collectors.groupingBy((Foo foo) -> {
                    return foo.getVariableCount();
})

And that returns a Map<Integer, List<Foo>> and I'm not sure how to transform that into what I want.

Thanks in advance

rogerdpack
  • 62,887
  • 36
  • 269
  • 388
James Kleeh
  • 12,094
  • 5
  • 34
  • 61

7 Answers7

15

You can use a sorted map for grouping and then just get the first entry. Something along the lines:

Collectors.groupingBy(
    Foo::getVariableCount,
    TreeMap::new,
    Collectors.toList())
.firstEntry()
.getValue()
lexicore
  • 42,748
  • 17
  • 132
  • 221
  • This does work, however I was hoping for a solution that didn't require the construction of the entire map. Thank you! – James Kleeh Feb 27 '18 at 17:22
  • 4
    @JamesKleeh Yes, this is suboptimal but works OOTB. I think there should be a better solution with a custom collector. – lexicore Feb 27 '18 at 17:23
  • @JamesKleeh A constant trade off in software: better performance or better developer time (including maintenance). – jpmc26 Feb 28 '18 at 04:38
11

Here is a solution that:

  1. Only streams the list once.
  2. Doesn't build a map or other structure that contains all of the input items (unless the variable counts are all the same), only keeping those that are currently the minimum.
  3. Is O(n) time, O(n) space. It's entirely possible that all Foos have the same variable count, in which case this solution would store all items like other solutions. But in practice, with different, varied values and higher cardinality, the number of items in the list is likely to be much lower.

Edited

I've improved my solution according to the suggestions in the comments.

I implemented an accumulator object, which supplies functions to the Collector for this.

/**
 * Accumulator object to hold the current min
 * and the list of Foos that are the min.
 */
class Accumulator {
    Integer min;
    List<Foo> foos;

    Accumulator() {
        min = Integer.MAX_VALUE;
        foos = new ArrayList<>();
    }

    void accumulate(Foo f) {
        if (f.getVariableCount() != null) {
            if (f.getVariableCount() < min) {
                min = f.getVariableCount();
                foos.clear();
                foos.add(f);
            } else if (f.getVariableCount() == min) {
                foos.add(f);
            }
        }
    }

    Accumulator combine(Accumulator other) {
        if (min < other.min) {
            return this;
        }
        else if (min > other.min) {
            return other;
        }
        else {
            foos.addAll(other.foos);
            return this;
        }
    }

    List<Foo> getFoos() { return foos; }
}

Then all we have to do is collect, referencing the accumulator's methods for its functions.

List<Foo> mins = foos.stream().collect(Collector.of(
    Accumulator::new,
    Accumulator::accumulate,
    Accumulator::combine,
    Accumulator::getFoos
    )
);

Testing with

List<Foo> foos = Arrays.asList(new Foo(3), new Foo(3), new Foo(2), new Foo(1), new Foo(1), new Foo(4));

The output is (with a suitable toString defined on Foo):

[Foo{1}, Foo{1}]
rgettman
  • 176,041
  • 30
  • 275
  • 357
  • 1
    @rgettman I can't tell who would win a test for performance, a `Map`, stream twice, or constantly (potentially) deleting elements from a `List`; still since this is checked, I guess this is what the OP was looking for. 1+ – Eugene Feb 27 '18 at 19:31
  • 6
    Your Collector’s characteristics are entirely unreasonable. Your Collector is not `CONCURRENT`, as it uses an `ArrayList` which is not thread safe. And since it maintains the order, specifying `UNORDERED` destroys that useful property. Besides that, using an `EnumSet` would be straight-forward. But note that there’s rarely a need to implement the `Collector` interface. Using `Collector.of(Accumulator::new, your accumulator function, your combiner function, Accumulator::getFoos)` would be enough. And your combiner has to consider the possibility that either `min` is still `null`. – Holger Feb 28 '18 at 07:36
  • An alternative would be to give the `Accumulator` the responsibility to handle any new `Foo`. But that's a detail. – glglgl Feb 28 '18 at 08:12
  • 1
    @glglgl since the functions are nontrivial, that would be indeed a good move. Then, the collector would get implemented as `Collector.of(Accumulator::new, Accumulator::add, Accumulator::merge, Accumulator::getFoos)`, similar to how `joining` using `StringJoiner` internally… – Holger Feb 28 '18 at 08:15
  • 5
    Nice answer but 1 remark about your `O(count of min) space`. If you had an input of a million `Foo(2)` followed by a single `Foo(1)` you'd still store that million `Foo(2)` elements first. – Imus Feb 28 '18 at 08:27
  • @Holger The combiner doesn't have to worry about null `min`; it's initialized to `Integer.MAX_VALUE` and `accumulate` specifically avoids writing `null` to it. However, you are absolutely correct about everything else you stated, and I have improved my answer based on what you stated. – rgettman Feb 28 '18 at 17:37
  • @Imus That's a good point, Big-O is worst case, which is O(n). I've updated that point, but it certainly won't store all items if the cardinality is more than one. – rgettman Feb 28 '18 at 17:38
  • @rgettman yes, I already noticed it but after the five minute period, so I could not edit the comment. I was misguided by the `Integer` type so I thought you’re using `null` to tell initial state and actual values apart; using `Integer.MAX_VALUE` implies that in the unlikely case of that value being the min value (the only value), the code will fail. You could fix that by using `long` and `Long.MAX_VALUE` as initial value; the performance won’t be worse than the avoidable unboxing. – Holger Feb 28 '18 at 17:47
  • …and you can simplify the `accumulate` method by reducing the redundancy: `Integer i = f.getVariableCount(); if(i != null && !(i > min)) { if(i < min) { min = i; foos.clear(); } foos.add(f); }` – Holger Feb 28 '18 at 17:52
6

IF you are OK streaming (iterating) twice:

private static List<Foo> mins(List<Foo> foos) {
    return foos.stream()
            .map(Foo::getVariableCount)
            .min(Comparator.naturalOrder())
            .map(x -> foos.stream()
                          .filter(y -> y.getVariableCount() == x)
                          .collect(Collectors.toList()))
            .orElse(Collections.emptyList());
}
Eugene
  • 117,005
  • 15
  • 201
  • 306
  • by default if there is nothing stream will return emptyList. – Adesh Kumar Feb 27 '18 at 18:20
  • @AdeshKumar The last map is from Optional – fps Feb 27 '18 at 18:24
  • @AdeshKumar I don't even understand what you actually mean here :| – Eugene Feb 27 '18 at 19:29
  • It would probably be more readable to first compute an `int x = foos.stream().map(Foo::getVariableCount).min(Comparator.naturalOrder()).orElse(0);` and then filter the second stream using that. (Oh, and rename it to something other than `x`.) – Ilmari Karonen Feb 28 '18 at 09:58
  • @IlmariKaronen **probably**... I find it pretty readable the way it is now. because of `orElse(0)` btw, there would be no need to filter it again, since this means that the source of the stream is empty anyways, so rather a check for `!isPresent() {returnCollections.emptyList() }` would be more appropriate; point is I *wanted* to do it in a single stream chained operation – Eugene Feb 28 '18 at 10:05
  • Filtering an empty stream is pretty efficient anyway. In terms of readability, probably the best solution IMO would be to check for and handle the empty list case *before* trying to find the minimum, and just replacing the `.orElse()` with `.get()`. – Ilmari Karonen Feb 28 '18 at 12:58
  • @IlmariKaronen I think that this is what I've initially done - you can see the edits of this answer, as said, I really wanted to do it in a single chained statements – Eugene Feb 28 '18 at 19:08
1

To avoid creating the map you could use two streams :

  • the first finds the minimum value.
  • the second filters elements with this value.

It could give :

List<Foo> foos = ...;
int min = foos.stream()
              .mapToInt(Foo::getVariableCount)
              .min()
              .orElseThrow(RuntimeException::new); // technical error

List<Foo> minFoos = foos.stream()
    .filter(f -> f.getVariableCount() == min)
    .collect(Collectors.toList());
davidxxx
  • 125,838
  • 23
  • 214
  • 215
1

To avoid creating the entire map and also avoiding streaming twice, I copied a custom collector from here https://stackoverflow.com/a/30497254/1264846 and modified it to work with min instead of max. I didn't even know custom collectors were possible so I thank @lexicore for pointing me in that direction.

This is the resulting function minAll

public static <T, A, D> Collector<T, ?, D> minAll(Comparator<? super T> comparator,
                                                  Collector<? super T, A, D> downstream) {
    Supplier<A> downstreamSupplier = downstream.supplier();
    BiConsumer<A, ? super T> downstreamAccumulator = downstream.accumulator();
    BinaryOperator<A> downstreamCombiner = downstream.combiner();
    class Container {
        A acc;
        T obj;
        boolean hasAny;

        Container(A acc) {
            this.acc = acc;
        }
    }
    Supplier<Container> supplier = () -> new Container(downstreamSupplier.get());
    BiConsumer<Container, T> accumulator = (acc, t) -> {
        if(!acc.hasAny) {
            downstreamAccumulator.accept(acc.acc, t);
            acc.obj = t;
            acc.hasAny = true;
        } else {
            int cmp = comparator.compare(t, acc.obj);
            if (cmp < 0) {
                acc.acc = downstreamSupplier.get();
                acc.obj = t;
            }
            if (cmp <= 0)
                downstreamAccumulator.accept(acc.acc, t);
        }
    };
    BinaryOperator<Container> combiner = (acc1, acc2) -> {
        if (!acc2.hasAny) {
            return acc1;
        }
        if (!acc1.hasAny) {
            return acc2;
        }
        int cmp = comparator.compare(acc1.obj, acc2.obj);
        if (cmp < 0) {
            return acc1;
        }
        if (cmp > 0) {
            return acc2;
        }
        acc1.acc = downstreamCombiner.apply(acc1.acc, acc2.acc);
        return acc1;
    };
    Function<Container, D> finisher = acc -> downstream.finisher().apply(acc.acc);
    return Collector.of(supplier, accumulator, combiner, finisher);
}
James Kleeh
  • 12,094
  • 5
  • 34
  • 61
1

Here is alternative with one stream and custom reducer. The idea is to first sort and then collect only elements with first min value:

    List<Foo> newlist = list.stream()
    .sorted( Comparator.comparing(Foo::getVariableCount) )
    .reduce( new ArrayList<>(), 
         (l, f) -> { 
             if ( l.isEmpty() || l.get(0).getVariableCount() == f.getVariableCount() ) l.add(f); 
             return l;
         }, 
         (l1, l2) -> {
             l1.addAll(l2); 
             return l1;
         } 
    );

Or using collect is even more compact:

    List<Foo> newlist = list.stream()
    .sorted( Comparator.comparing(Foo::getVariableCount) )
    .collect( ArrayList::new, 
         (l, f) -> if ( l.isEmpty() || l.get(0).getVariableCount() == f.getVariableCount() ) l.add(f),
         List::addAll
    );
shmosel
  • 49,289
  • 6
  • 73
  • 138
tsolakp
  • 5,858
  • 1
  • 22
  • 28
  • Beware of using sorted. I believe this sticks the whole thing into an array and then just uses Arrays.sort. If you’re going to do that you might just as well collect the original stream into a list and then do the obvious two iterations solution. – cpp beginner Feb 27 '18 at 17:55
1

You could use collect wisely on the sorted list and in accumulator add the logic to add only either first element to empty list or add any other Foo having variable count same as of the first element of the list.

A complete working example below:-

import java.util.ArrayList;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;

class Foo {
    private int variableCount;

    public Foo(int vars) {
        this.variableCount = vars;
    }

    public Integer getVariableCount() {
        return variableCount;
    }

    public static void main(String[] args) {
        List<Foo> list = Arrays.asList(
                new Foo(2),
                new Foo(2),
                new Foo(3),
                new Foo(3),
                new Foo(1),
                new Foo(1)
        );

        System.out.println(list.stream()
                .sorted(Comparator.comparing(Foo::getVariableCount))
                .collect(() -> new ArrayList<Foo>(),
                        (ArrayList<Foo> arrayList, Foo e) -> {
                            if (arrayList.isEmpty()
                                    || arrayList.get(0).getVariableCount() == e.getVariableCount()) {
                                arrayList.add(e);
                            }
                        },
                        (ArrayList<Foo> foos, ArrayList<Foo> foo) -> foos.addAll(foo)
                )

        );
    }

    @Override
    public String toString() {
        return "Foo{" +
                "variableCount=" + variableCount +
                '}';
    }
}

Also, you could first find the minimum variableCount in one stream and use that inside filter of another stream.

    list.sort(Comparator.comparing(Foo::getVariableCount));
    int min = list.get(0).getVariableCount();
    list.stream().filter(foo -> foo.getVariableCount() == min)
            .collect(Collectors.toList());

I think in any case either sorting is required or a way to find the minimum number which later can be used inside the predicate. Even if you are using the map to group the values.

Cheers!

Vinay Prajapati
  • 7,199
  • 9
  • 45
  • 86