2

For a range of integers, I would like to apply an ("expensive") operation, filter out only those integers with interesting answers, then group on the answer.

This first snippet works, but it duplicates the operation ("modulus 2") both in code and computation:

IntStream.range(1, 10).boxed()
         .filter(p -> (p % 2 != 0))                     // Expensive computation
         .collect(Collectors.groupingBy(p -> (p % 2))); // Same expensive
                                                        // computation!

// {1=[1, 3, 5, 7, 9]} (Correct answer)

I tried mapping to the answer first, then filter, then group - but the initial Integer of course gets lost along the way:

IntStream.range(1, 10).boxed()
         .map(p -> p % 2)                        // Expensive computation
         .filter(p -> p != 0)
         .collect(Collectors.groupingBy(p -> p));

// {1=[1, 1, 1, 1, 1]} (Of course, wrong answer)

I would like to map to a tuple or something like that, but haven't figured out a clean way to do it.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
Markus
  • 809
  • 5
  • 10
  • 2
    Could you explain what you are trying to do? The only explanation we have is that you are trying to do "an expensive" operation (modulo isn't really expensive to me but ok I'll go with the flow). But what is the input and what is the expected output? – Juru May 19 '15 at 21:20
  • 6
    Probably the best way is to create a small value class (helper class) to contain both the original int and the result of the expensive operation. See http://stackoverflow.com/a/29340148/1441122 . – Stuart Marks May 19 '15 at 21:31
  • @StuartMarks, thats a good answer you linked to. A helper class, ultimately imitating a tuple, would work. But it feels somewhat bulky and old-timey. – Markus May 19 '15 at 21:39
  • 1
    Why? If you don't want to do the same expensive computation twice, you need to store it somewhere and this is achieved by having a custom class handling these results. You can use an `int[2]` array, but you loose a bit of semantics. – Alexis C. May 19 '15 at 21:40
  • @Markus do you know who Stuart Marks is? If he says a helper class is the way to go then that is the best way ;P, even if you feel its old timey. – Juru May 19 '15 at 21:42
  • 2
    @Markus You could do something like this: `IntStream.range(1, 10).mapToObj(p -> new int[]{p, p % 2}).filter(arr -> arr[1] != 0).collect(groupingBy(arr -> arr[1], mapping(arr -> arr[0], toList())));` – Alexis C. May 19 '15 at 21:49
  • 1
  • 1
    @Juru Also note my preferred style for this sort of thing is public final fields initialized by a constructor, with no setters or getters. That cuts down on the bulk somewhat. – Stuart Marks May 19 '15 at 22:00
  • @StuartMarks Sorry for the "old-timey", didn't really mean it. :) – Markus May 19 '15 at 22:06
  • @AlexisC. I like! See (my own) answer below, inspired by your grouping example. – Markus May 19 '15 at 22:09
  • The common themes among the answers seems to be to either emulate a tuple in some way - a private helper class, an `int[]` array or `AbstractMap.SimpleEntry` - or to first collect input and answers to a map which then is grouped. The tuple emulation feels more straight-forward, and as a seasoned Java programmer, creating helper classes does not frighten me, so that will be the accepted answer. Thank you all so much for your efforts in this question! – Markus May 20 '15 at 17:47

6 Answers6

4

Well, since I described what I'd do, and there's been a bit of discussion about it, I figured I should write down what I was describing:

    class IntPair {
        final int input, result;
        IntPair(int i, int r) { input = i; result = r; }
    }

    Map<Integer, List<Integer>> output =
        IntStream.range(1, 10)
            .mapToObj(i -> new IntPair(i, i % 2))
            .filter(pair -> pair.result != 0)
            .collect(groupingBy(pair -> pair.result,
                mapping(pair -> pair.input, toList())));

Note that the helper class can (and probably should) be a nested class of some sort, or even a local class.

One thing that's nice about having names for the fields is that it does make it easier to understand what's going on. When I initially wrote this up, I had inadvertently reversed the roles of input and result in the grouping operation and thus I got the incorrect result. After rereading the code it was quite easy for me to see that I was grouping by the input values instead result values, and it was also very easy to fix. This would be harder to diagnose and fix if I had to use arr[0] and arr[1] or tuple.t1 and tuple.t2.

Stuart Marks
  • 127,867
  • 37
  • 205
  • 259
2

Why not group on the result of the expensive computation and then filter the resulting map?

IntStream.range(1, 10).boxed()
        .collect(groupingBy(x -> x % 2))
        .entrySet().stream()
        .filter(e -> e.getKey() != 0)
        .collect(toMap(Map.Entry::getKey, Map.Entry::getValue));
Misha
  • 27,433
  • 6
  • 62
  • 78
  • Not a one liner, but probably more efficient to build the map first with your `groupingBy`. And then call `map.remove(0);`. – Alexis C. May 20 '15 at 08:32
  • it would have to be `map.keySet().removeIf(x->x==0)`, since `!=0` is just a placeholder for "interesting" test. Also, that approach will require 3-argument `groupingBy` with `HashMap::new` -- `groupingBy` doesn't guarantee a mutable map. Yes, it will be slightly more efficient that way, but the question is about grouping on a result of an **expensive** computation, so it shouldn't matter. – Misha May 20 '15 at 08:40
1

If you want to implement this via single stream (without collecting to the intermediate map), you can do it like this:

IntStream.range(1, 10).boxed()
    .map(p -> new AbstractMap.SimpleEntry<>(p % 2, p))
    .filter(entry -> entry.getKey() != 0)
    .collect(Collectors.groupingBy(Entry::getKey,
             Collectors.mapping(Entry::getValue, Collectors.toList())));

If you don't mind using third-party code, my StreamEx library has syntactic sugar especially for such tasks:

IntStreamEx.range(1, 10).boxed()
     // map to Map.Entry where keys are your expensive computation
     // and values are input elements. The result is EntryStream
     // which implements the Stream<Map.Entry> and has additional methods
    .mapToEntry(p -> p % 2, Function.identity())
    .filterKeys(k -> k != 0)
    .grouping();

Internally it's pretty much the same as the first solution.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
1

The general solution is to remember the result of the computation. E.g. like suggested by Stuart Marks but if you don’t want to introduce a new type you may use an int array instead:

Map<Integer, List<Integer>> map = IntStream.range(1, 10)
    .mapToObj(i -> new int[]{i, i % 2})
    .filter(pair -> pair[1] != 0)
    .collect(groupingBy(pair -> pair[1],
        mapping(pair -> pair[0], toList())));

The specific solution to this special case is to replace the expensive operation i%2 by the simple operation i&1:

Map<Integer, List<Integer>> map = IntStream.range(1, 10)
    .filter(i-> (i&1)!=0)
    .boxed().collect(groupingBy(i->i&1));

That operation is so cheap that I wouldn’t care about it’s repetition. But if you do, of course

Map<Integer, List<Integer>> map = IntStream.range(1, 10)
    .filter(i-> (i&1)!=0)
    .boxed().collect(groupingBy(i->1));

or

Map<Integer, List<Integer>> map = Collections.singletonMap(1, 
    IntStream.range(1, 10).filter(i-> (i&1)!=0)
             .boxed().collect(toList()));

will solve the problem. It is, of course, not a re-usable solution but lambda expressions are one-use code fragments anyway.

Community
  • 1
  • 1
Holger
  • 285,553
  • 42
  • 434
  • 765
0

One way is to collect the numbers and answers to Map<Integer, Integer> first, then operate on the stream of entries:

IntStream.range(1, 10).boxed()
         .collect(toMap(p -> p, p -> p % 2)) 
         .entrySet().stream()
         .filter(p -> p.getValue() != 0)
         .collect(groupingBy(p -> p.getValue(),
                  mapping(p -> p.getKey(),
                  toList())));

Not sure about the performance implications if this though.

Markus
  • 809
  • 5
  • 10
  • 2
    Why not simply `range(...).mapToObj(p -> new SimpleEntry<>(p % 2, p)).filter(e -> e.getKey() != 0).collect(groupingBy(SimpleEntry::getKey, mapping(SimpleEntry::getValue, toList())));` then? – Alexis C. May 19 '15 at 22:06
  • 1
    Do a small test where you perform this implementation 1 milion times and the other correct one in your question and see how long both took, that should give you an idea about the performance implications. – Juru May 19 '15 at 22:13
  • @AlexisC. The SimpleEntry is defined in AbstractMap. Means including AbstractMap static or `new AbstractMap.SimpleEntry<>(...)`. – Markus May 19 '15 at 22:22
  • 1
    @Markus Well it's just an import, like any other `import java.util.AbstractMap.SimpleEntry;` – Alexis C. May 19 '15 at 22:37
  • @Markus I did performance tests in my answer, it is faster than the original implementation but the one suggested by StuartMarks still outperforms it. – Juru May 19 '15 at 23:03
  • @Juru My example becomes more and more misleading. :) The performance issue I worried about is the `.collect` on row 2 which creates objects that will be discarded in the `.filter` on row 4. Now imagine if just one Entry in 1000 or so passes the filter... – Markus May 20 '15 at 17:24
  • @Juru My mistake, the entry will be created anyway - the only addition is creating a Map which should be negligible. – Markus May 20 '15 at 17:50
0

So here is my solution :). Maybe not the best.

import java.util.stream.IntStream;
import java.util.stream.Collectors;
import java.util.Map;
import java.util.List;

public class Test {

    private static final class Tuple {
        private final Integer t1;
        private final Integer t2;

        private Tuple(final Integer t1, final Integer t2) {
            this.t1 = t1;
            this.t2 = t2;
        }

        public Integer getT1() { return t1; }
        public Integer getT2() { return t2; }
    }

    private static String getValues(final List<Tuple> list) {
        final StringBuilder stringBuilder = new StringBuilder();
        for(Tuple t : list) {
            stringBuilder.append(t.getT2()).append(", ");
        }
        return stringBuilder.toString();
    }

     public static void main(String []args){

        Map<Integer, List<Tuple>> results =
        IntStream.range(1, 10).boxed()
         .map(p -> new Tuple(p % 2, p))                     // Expensive computation
         .filter(p -> p.getT1() != 0)
         .collect(Collectors.groupingBy(p -> p.getT1())); 

        results.forEach((k, v) -> System.out.println(k + "=" + getValues(v)));
     }

}

Output is 1=1, 3, 5, 7, 9;

About performance:

sh-4.3# java -Xmx128M -Xms16M Test                                                                                                                                                                     
Time a1: 231                                                                                                                                                                                                  
Time a2: 125                                                                                                                                                                                                  
sh-4.3# java -Xmx128M -Xms16M Test
Time a1: 211                                                                                                                                                                                                  
Time a2: 127                                                                                                                                                                                                  
sh-4.3# java -Xmx128M -Xms16M Test
Time a1: 172                                                                                                                                                                                                  
Time a2: 100 

A1 is your first algorithm in the question and A2 is mine in this answer. So it is faster even with a helper class.

Here is performance measurements also including the algorithm in your answer (as A3):

sh-4.3# java -Xmx128M -Xms16M HelloWorld                                                                                                                                                                      
Time a1: 202                                                                                                                                                                                                  
Time a2: 113                                                                                                                                                                                                  
Time a2: 170                                                                                                                                                                                                  
sh-4.3# java -Xmx128M -Xms16M HelloWorld                                                                                                                                                                      
Time a1: 195                                                                                                                                                                                                  
Time a2: 114                                                                                                                                                                                                  
Time a2: 169 

I find it weird that yours is outperformed by mine, thought it would be more or less the same.

Juru
  • 1,623
  • 17
  • 43