4

How can one idiomatically enumerate a Stream<T> which maps each T instance to a unique integer using Java 8 stream methods (e.g. for an array T[] values, creating a Map<T,Integer> where Map.get(values[i]) == i evaluates to true)?

Currently, I'm defining an anonymous class which increments an int field for use with the Collectors.toMap(..) method:

private static <T> Map<T, Integer> createIdMap(final Stream<T> values) {
    return values.collect(Collectors.toMap(Function.identity(), new Function<T, Integer>() {

        private int nextId = 0;

        @Override
        public Integer apply(final T t) {
            return nextId++;
        }

    }));
}

However, is there not a more concise/elegant way of doing this using the Java 8 stream API? — bonus points if it can be safely parallelized.

errantlinguist
  • 3,658
  • 4
  • 18
  • 41

3 Answers3

5

Your approach will fail, if there is a duplicate element.

Besides that, your task requires mutable state, hence, can be solved with Mutable reduction. When we populate a map, we can simple use the map’s size to get an unused id.

The trickier part is the merge operation. The following operation simply repeats the assignments for the right map, which will handle potential duplicates.

private static <T> Map<T, Integer> createIdMap(Stream<T> values) {
    return values.collect(HashMap::new, (m,t) -> m.putIfAbsent(t,m.size()),
        (m1,m2) -> {
            if(m1.isEmpty()) m1.putAll(m2);
            else m2.keySet().forEach(t -> m1.putIfAbsent(t, m1.size()));
        });
}

If we rely on unique elements, or insert an explicit distinct(), we can use

private static <T> Map<T, Integer> createIdMap(Stream<T> values) {
    return values.distinct().collect(HashMap::new, (m,t) -> m.put(t,m.size()),
        (m1,m2) -> { int leftSize=m1.size();
            if(leftSize==0) m1.putAll(m2);
            else m2.forEach((t,id) -> m1.put(t, leftSize+id));
        });

}
Holger
  • 285,553
  • 42
  • 434
  • 765
  • I like the trick about the map size. clever. but why do you need the check `if(leftSize==0)`? this is a non-concurrent collector, so the supplier is going to be called for as many elements there are in the stream and then the accumulator is going to put an element in an empty map before the combiner – Eugene Feb 14 '17 at 16:10
  • 1
    @Eugene: the merge function will be called for partial results. These depend on the splitting capability of the stream source (i.e. whether it can split balanced) and whether there are size changing operations in-between (`filter` or `flatMap`). So it is possible that the combiner function gets called with empty partial results. That’s still not making the test for empty maps *required* here, as the ordinary merge operation would do the right thing. It’s only an optimization, using a cheap test and simplifying the operation in that case. – Holger Feb 14 '17 at 16:59
  • 1
    The three-arg `collect` method doesn’t allow to max that out, though. If you create a custom collector via `Collector.of`, you may even return the second map, if the first is empty, omitting the entire `putAll` operation. If you are interesting in more details regarding the work splitting and potential empty partial results, you may consider [this Q&A](http://stackoverflow.com/q/34381805/2711488). – Holger Feb 14 '17 at 17:05
4

I would do it in this way:

private static <T> Map<T, Integer> createIdMap2(final Stream<T> values) {
    List<T> list = values.collect(Collectors.toList());
    return IntStream.range(0, list.size()).boxed()
            .collect(Collectors.toMap(list::get, Function.identity()));
}

For sake or parallelism, it can be changed to

   return IntStream.range(0, list.size()).parallel().boxed().
                (...)
Andremoniy
  • 34,031
  • 20
  • 135
  • 241
  • 1
    You are quite right, I didn't noticed that `parallel()` can be invoked directly, thanks – Andremoniy Feb 14 '17 at 15:19
  • 3
    Your solution is simple and sufficient, if you can afford the intermediate storage. As said, you can use `List list = values.distinct().collect(Collectors.toList());` if there are duplicates expected. Though, well, you don’t have to; the ids will be unique in either case and nobody said that they are not allowed to have gaps… – Holger Feb 14 '17 at 15:22
  • This answer is much more readable than [Holger's](http://stackoverflow.com/a/42229622/1391325) but it would be nice to not have to use an intermediate list. Also sad is that it seems that [`Collectors.toList()`](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collectors.html#toList--) isn't guaranteed to return a random-access list, meaning that the complexity of your solution could vary wildly; Would something in Java's standard library analogous to Python's [`enumerate(iterable)`](https://docs.python.org/3/library/functions.html#enumerate) not be great? – errantlinguist Feb 16 '17 at 10:41
  • @errantlinguist: in principle, you are right in that the documentation doesn’t state that the list will have random access, but on the other hand, the random access property is not within the explicitly named properties that are unspecified (type, mutability, serializability, thread-safety). I think, it is reasonable to assume that the returned list will never have an expensive `get` operation. And this solution will still work, if this assumption doesn’t hold, it’s just slower (at the responsibility of whoever decided to let `toList()` return a list without random access)… – Holger Feb 16 '17 at 10:52
0

Comparing to convert the input stream to List first in the solution provided by Andremoniy. I would prefer to do it in different way because we don't know the cost of "toList()" and "list.get(i)", and it's unnecessary to create an extra List, which could be small or bigger

private static <T> Map<T, Integer> createIdMap2(final Stream<T> values) {
    final MutableInt idx = MutableInt.of(0); // Or: final AtomicInteger idx = new AtomicInteger(0);        
    return values.collect(Collectors.toMap(Function.identity(), e -> idx.getAndIncrement()));
}

Regardless to the question, I think it's a bad design to pass streams as parameters in a method.

user_3380739
  • 1
  • 14
  • 14