37

I have a stream of objects and I would like to find the one with a maximal value of some attribute that's expensive to calculate.

As a specific simple example, say that we have a list of strings and we want to find the coolest one, given a coolnessIndex function.

The following should work:

String coolestString = stringList
        .stream()
        .max((s1, s2) -> Integer.compare(coolnessIndex(s1), coolnessIndex(s2)))
        .orElse(null);

Now, there are two problems with this. First, assuming the coolnessIndex is expensive to calculate, this probably won't be very efficient. I suppose the max method will need to use the comparator repeatedly, which in turn will call the coolnessIndex repeatedly and at the end it will be called more than once for each string.

Second, having to provide the comparator leads to some redundancy in the code. I would much prefer syntax like this:

String coolestString = stringList
        .stream()
        .maxByAttribute(s -> coolnessIndex(s))
        .orElse(null);

However, I haven't been able to find a matching method in the Stream API. This surprises me, since finding min/max by an attribute seems like a common pattern. I wonder if there's a better way than using the comparator (other than a for loop).

Alexis C.
  • 91,686
  • 21
  • 171
  • 177
Jan Pomikálek
  • 1,369
  • 2
  • 13
  • 22
  • 2
    Related but not quite duplicate: http://stackoverflow.com/questions/27606185/arg-max-in-java-8-streams (where the concern is code brevity rather than efficiency, and I think the recommended solution still ends up calling the equivalent of `coolnessIndex` repeatedly). – Gareth McCaughan Apr 21 '16 at 16:31
  • I don't believe there is anything equivalent to this in the Java streams API. You could implement your own version of `maxByAttribute` (someone else has done something of the sort [here](https://gist.github.com/mapio/57299694ef94cc88dddb) but I haven't looked at their code), or you could use `map` to get a stream of pairs (`s`,`coolnessIndex(s)`) and then `max` those -- but AIUI Java doesn't have a handy pair class so you'd end up with a lot of boilerplate code, not to mention all the extra memory allocations. – Gareth McCaughan Apr 21 '16 at 16:35
  • You may group by the string->coolness results in a map and then choose the coolest string. See https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toMap-java.util.function.Function-java.util.function.Function- – Kedar Mhaswade Apr 21 '16 at 18:17
  • After more searching I found the same problem discussed here: http://stackoverflow.com/questions/32878398/efficiency-of-the-way-comparator-works. I like the answer from bayou.io, in which he offers an elegant generic solution using a cache function. – Jan Pomikálek Apr 26 '16 at 19:47

9 Answers9

11
Stream<String> stringStream = stringList.stream();
String coolest = stringStream.reduce((a,b)-> 
    coolnessIndex(a) > coolnessIndex(b) ? a:b;
).get()

if calling coolnessIndex is expensive we can use distinct so it's called only for distinct elements (assuming that the coolnesIndex is the same for equal elements)

Stream<String> stringStream = stringList.stream().distinct();
String coolest = stringStream.reduce((a,b)-> 
    coolnessIndex(a) > coolnessIndex(b) ? a:b;
).get()

Stream distinct()

frhack
  • 4,862
  • 2
  • 28
  • 25
  • While this works, it doesn't seem to address OP's problem of redundantly calling `coolnessIndex` – kane Nov 29 '21 at 23:23
6

Here's a variant using an Object[] as a tuple, not the prettiest code but concise

String coolestString = stringList
        .stream()
        .map(s -> new Object[] {s, coolnessIndex(s)})
        .max(Comparator.comparingInt(a -> (int)a[1]))
        .map(a -> (String)a[0])
        .orElse(null);
gustf
  • 1,959
  • 13
  • 20
4

Thanks everyone for suggestions. At last I found the solution I like the most at Efficiency of the way comparator works -- the answer from bayou.io:

Have a general purpose cache method:

public static <K,V> Function<K,V> cache(Function<K,V> f, Map<K,V> cache)
{
    return k -> cache.computeIfAbsent(k, f);
}

public static <K,V> Function<K,V> cache(Function<K,V> f)
{
    return cache(f, new IdentityHashMap<>());
}

This could then be used as follows:

String coolestString = stringList
        .stream()
        .max(Comparator.comparing(cache(CoolUtil::coolnessIndex)))
        .orElse(null);
Community
  • 1
  • 1
Jan Pomikálek
  • 1,369
  • 2
  • 13
  • 22
  • would it be possible to have only one function `cache` that creates the IdentityHashMap and returns the desired function with the cache as closure? – Jan Jan 18 '22 at 15:27
0

How about using two streams, one to create a map with the pre-calculated values and a second using the map's entry set to find the max value:

String coolestString = stringList
        .stream()
        .collect(Collectors.toMap(Function.identity(), Test::coolnessIndex))
        .entrySet()
        .stream()
        .max((s1, s2) -> Integer.compare(s1.getValue(), s2.getValue()))
        .orElse(null)
        .getKey();
Community
  • 1
  • 1
kensei62
  • 144
  • 9
  • Or you can use `TreeMap` to collect the entries in a manner sorted by coolness index -- this helps you get rid of another stream. – Kedar Mhaswade Apr 21 '16 at 18:19
0

This is a reduction problem. Reducing a list down to a specific value. In general reduce works down the list operating on a partial solution and an item in the list. In this case, that would mean comparing the previous 'winning' value to the new value from the list which will calculate the expensive operation twice on each comparison.

According to https://docs.oracle.com/javase/tutorial/collections/streams/reduction.html an alternative is to use collect instead of reduce.

A custom consumer class will allow keeping track of the expensive operations as it reduces the list. Consumer can get around the multiple calls to the expensive calculation by working with mutable state.

class Cooler implements Consumer<String> {
    String coolestString = "";
    int coolestValue = 0;

    public String coolest() {
        return coolestString;
    }

    @Override
    public void accept(String arg0) {
        combine(arg0, expensive(arg0));
    }

    private void combine(String other, int exp) {
        if (coolestValue < exp) {
            coolestString = other;
            coolestValue = exp;
        }
    }

    public void combine(Cooler other) {
        combine(other.coolestString, other.coolestValue);
    }
}

This class accepts a string and if it is cooler than the previous winner, it replaces it and saves the expensive calculated value.

Cooler cooler = Stream.of("java", "php", "clojure", "c", "lisp")
        .collect(Cooler::new, Cooler::accept, Cooler::combine);
System.out.println(cooler.coolest());
Community
  • 1
  • 1
GregA100k
  • 1,385
  • 1
  • 11
  • 16
0

If speed/overhead is important, you don't want to use Stream.max(Comparator) which recomputes the score many times for the winning object; the cache solution above works, but with substantial O(N) overhead. The decorator pattern takes more memory allocation/GC overhead.

Here's a beautiful, reusable solution for your utility library & Java 15:

/** return argmin item, else null if none.  NAN scores are skipped */
public static <T> T argmin(Stream<T> stream, ToDoubleFunction<T> scorer) {
    Double min = null;
    T argmin = null;
    for (T p: (Iterable<T>) stream::iterator) {
        double score = scorer.applyAsDouble(p);
        if (min==null || min > score) {
            min = score;
            argmin = p;
        }
    }
    return argmin;
}
George Forman
  • 600
  • 5
  • 7
-1

I would create a local class (a class defined inside a method—rare, but perfectly legal), and map your objects to that, so the expensive attribute is computed exactly once for each:

class IndexedString {
    final String string;
    final int index;

    IndexedString(String s) {
        this.string = Objects.requireNonNull(s);
        this.index = coolnessIndex(s);
    }

    String getString() {
        return string;
    }

    int getIndex() {
        return index;
    }
}

String coolestString = stringList
    .stream()
    .map(IndexedString::new)
    .max(Comparator.comparingInt(IndexedString::getIndex))
    .map(IndexedString::getString)
    .orElse(null);
VGR
  • 40,506
  • 4
  • 48
  • 63
-1

You can utilize the idea of collecting the results from the stream appropriately. The constraint of expensive coolness calculation function makes you consider calling that function exactly once for each element of the stream.

Java 8 provides the collect method on the Stream and a variety of ways in which you can use collectors. It appears that if you used the TreeMap to collect your results, you can retain the expressiveness and at the same time remain considerate of efficiency:

public class Expensive {
    static final Random r = new Random();
    public static void main(String[] args) {
        Map.Entry<Integer, String> e =
        Stream.of("larry", "moe", "curly", "iggy")
                .collect(Collectors.toMap(Expensive::coolness,
                                          Function.identity(),
                                          (a, b) -> a,
                                          () -> new TreeMap<>
                                          ((x, y) -> Integer.compare(y, x))
                        ))
                .firstEntry();
        System.out.println("coolest stooge name: " + e.getKey() + ", coolness: " + e.getValue());
    }

    public static int coolness(String s) {
        // simulation of a call that takes time.
        int x = r.nextInt(100);
        System.out.println(x);
        return x;
    }
}

This code prints the stooge with maximum coolness and the coolness method is called exactly once for each stooge. The BinaryOperator that works as the mergeFunction ((a, b) ->a) can be further improved.

Kedar Mhaswade
  • 4,535
  • 2
  • 25
  • 34
-1

just create your (object,metric) pairs first:

public static <T> Optional<T> maximizeOver(List<T> ts, Function<T, Integer> f) {
    return ts.stream().map(t -> Pair.pair(t, f.apply(t)))
            .max((p1, p2) -> Integer.compare(p1.second(), p2.second()))
            .map(Pair::first);
}

(those are com.googlecode.totallylazy.Pair's)

Community
  • 1
  • 1
Paul Janssens
  • 622
  • 3
  • 9