9

How do you convert a collection like ["a", "b", "c"] to a map like {"a": 0, "b":1, "c":2} with the values being the order of iteration. Is there a one liner with streams and collectors in JDK8 for it? Old fashion way is like this:

    Collection<String> col = apiCall();
    Map<String, Integer> map = new HashMap<>();
    int pos = 0;
    for (String s : collection) {
        map.put(s, pos++);
    }
danial
  • 4,058
  • 2
  • 32
  • 39
  • 2
    Note that your code will create the map `{"a": 0, "b": 1, "c": 2}`, and the answers so far have followed along. – ajb Jul 30 '14 at 15:49
  • You may be interested in http://stackoverflow.com/questions/17640754/zipping-streams-using-jdk8-with-lambda-java-util-stream-streams-zip – Brian Agnew Jul 30 '14 at 15:54

6 Answers6

6

It you don't need a parallel stream you can use the length of the map as an index counter:

collection.stream().forEach(i -> map.put(i, map.size() + 1));
maba
  • 47,113
  • 10
  • 108
  • 118
  • 3
    This is a bad solution, I'm afraid: it's not thread-safe, since different elements may be processed by different threads, even on a sequential stream. Also (less important) forEach is not guaranteed to maintain order. – Maurice Naftalin Jul 30 '14 at 20:55
  • @Maurice Naftalin: “sequential” is the *opposite* of “parallel”. Possibly you meant “ordered” which can be still parallel. – Holger Jul 31 '14 at 15:02
  • @Holger Sorry, I have confused this thread by deleting the comment you responded to, because I wanted to revise it. For anyone reading this, what I said was: "No, I meant what I said: "even on a sequential stream". The documentation for java.util.stream says (under "Side-effects"): "Even when a pipeline is constrained to produce a result that is consistent with the encounter order of the stream source, no guarantees are made as to ... what thread any behavioral parameter is executed for a given element." – Maurice Naftalin Aug 01 '14 at 09:09
  • @Holger The problem with the comment that I deleted is that it's right but inapplicable; the phrase I referred to documents the behaviour of intermediate operations. The documentation I needed here is for `forEach`: "For any given element, the action may be performed at whatever time and in whatever thread the library chooses. If the action accesses shared state, it is responsible for providing the required synchronization." So this solution is not thread-safe. (`forEachOrdered` does not suffer the same problem, so your solution below works). – Maurice Naftalin Aug 01 '14 at 09:18
  • @Maurice Naftalin: we can delete the comments then. Regarding `forEach`, I’m not sure whether the quoted sentences still belong to the preceding one which starts with “For parallel stream pipelines,”. – Holger Aug 01 '14 at 09:26
  • @Holger Reading it literally, it doesn't. Actually, I believe that the qualifier on the previous sentence shouldn't be there either; it contradicts the first sentence "... explicitly nondeterministic". You're not meant to rely on any behavior that is correct only in sequential mode. – Maurice Naftalin Aug 01 '14 at 11:20
  • @Holger It seems I was wrong about the qualifier on the previous sentence. You can in fact rely on `forEach` respecting encounter order in sequential streams. – Maurice Naftalin Aug 02 '14 at 18:03
6

Here's an approach:

List<String> list = Arrays.asList("a", "b", "c");

Map<String, Integer> map =
    IntStream.range(0, list.size())
        .boxed()
        .collect(toMap(i -> list.get(i), i -> i));

Not necessarily a one-liner or shorter than the straightforward loop, but it does work using a parallel stream if you change toMap to toConcurrentMap.

Also note, this assumes that you have a random-access list, not a general Collection. If you have a Collection that you otherwise can make no assumptions about, there's not much you can do other than to iterate over it sequentially and increment a counter.

UPDATE

The OP has clarified that the input is a Collection and not a List so the above doesn't apply. It seems that we can assume very little about the input Collection. The OP has specified iteration order. With a sequential iterator, the elements will come out in some order although no guarantees can be made about it. It might change from run to run, or even from one iteration to the next (though this would be unusual in practice -- unless the underlying collection is modified).

If the exact iteration order needs to be preserved, I don't believe there's a way to preserve it into the result Map without iterating the input Collection sequentially.

If, however, the exact iteration order isn't important, and the requirement is that the output Map have unique values for each input element, then it would be possible to do something like this in parallel:

Collection<String> col = apiCall();
Iterator<String> iter = col.iterator();

Map<String, Integer> map =
    IntStream.range(0, col.size())
        .parallel()
        .boxed()
        .collect(toConcurrentMap(i -> { synchronized (iter) { return iter.next(); }},
                                 i -> i));

This is now far from a one-liner. It's also not clear to me how useful it is. :-) But it does demonstrate that it's possible to do something like this in parallel. Note that we've had to synchronize access to the input collection's iterator since it will be called from multiple threads. Also note that this is an unusual use of the iterator, since we never call hasNext and we assume that it is safe to call next exactly the number of times returned by the input collection's size().

Stuart Marks
  • 127,867
  • 37
  • 205
  • 259
2

Based on maba’s answer the general solution is:

collection.stream().forEachOrdered(i -> map.put(i, map.size()));

From the documentation of void forEachOrdered(Consumer<? super T> action):

This operation processes the elements one at a time, in encounter order if one exists.

The important aspect here that it retains the order if there is one, e.g. if the Collection is a SortedSet or a List. Such a stream is called an ordered stream (not to confuse with sorted stream). It might invoke the consumer method by different threads but always ensuring the “one at a time” and thread-safety guaranty.

Of course, it won’t benefit from parallel execution if the stream is parallel.


For completeness, here is the solution which will work even on parallel streams utilizing the parallel processing, if they are still ordered:

stream.collect(HashMap::new, (m, i) -> m.put(i, m.size()),
  (a, b) -> {int offset = a.size(); b.forEach((k, v) -> a.put(k, v + offset));});
Community
  • 1
  • 1
Holger
  • 285,553
  • 42
  • 434
  • 765
1

If you don't mind using 3rd party libraries, my cyclops-react lib has extensions for all JDK Collection types, with a large number of powerful operaitons attached, so you could implement this like so :-

    CollectionX<String> col = CollectionX.fromCollection(orgCol);
    col.zipWithIndex()
       .toMap(k->k.v1, v->v.v2);

cyclops-react Collection extensions are eager, so you would get better performance with our Stream extension, ReactiveSeq (which extends jOOλ's Seq, which in turn is an extension of JDK's java.util.stream.Stream, it also implements the reactive-streams api).

    ReactiveSeq.fromCollection(col)
               .zipWithIndex()
               .toMap(k->k.v1, v->v.v2);
John McClean
  • 5,225
  • 1
  • 22
  • 30
1

You can use AtomicInteger as index in stream:


    Collection col = Arrays.asList("a", "b", "c");
    AtomicInteger index = new AtomicInteger(0);
    Map collectedMap =  col.stream().collect(Collectors.toMap(Function.identity(), el -> index.getAndIncrement()));
    System.out.println("collectedMap = " + collectedMap);
K. Gol
  • 1,391
  • 12
  • 15
0

Try

    int[] pos = { 0 };
    list.forEach( a -> map.put(a, pos[0]++));
Syam S
  • 8,421
  • 1
  • 26
  • 36
  • 2
    How do you know the collection doesn't use a parallel steam? – fabian Jul 30 '14 at 15:54
  • @fabian: I didnt understand the question? – Syam S Jul 30 '14 at 15:55
  • @fabian A `Stream` doesn't seem to be involved here. – Sotirios Delimanolis Jul 30 '14 at 15:59
  • Streams are not required do run on a single thread. Therefore using `forEach` on a stream may result in paralell execution of the lambda expression. – fabian Jul 30 '14 at 16:00
  • I just used `list.forEach`. So it doesn't involve any parallelism. If I want to do it parallely I would use streams like `list.stream().parallel().forEach( a -> map.put(a, pos[0]++));`. In that case the value may not be order of iteration. – Syam S Jul 30 '14 at 16:00
  • As fabian say, there's a problem here, if the `Stream` is parallel: 1) in array increment operation isn't safe, isn't atomic, so you might get two same indices. (use `AtomicInteget`). But `AtomicInteger` won't help you, because OP was requesting consecutive indices, not in random order. And `forEach` doesn't guarantee the stream to be serial, not parallel. – Dmitry Ginzburg Jul 30 '14 at 16:01
  • 2
    @SyamS: The same holds for `Iterator.forEach`, as clarified by this sentence from the javadoc: "**Unless otherwise specified by the implementing class**, actions are performed in the order of iteration". This sentence tells you the order of execution may not be the same as in the iterator. And the question doesn't tell what kind of collection is used. – fabian Jul 30 '14 at 16:07
  • @fabian: Thank you. I didn't know that. My example was using ArrayList. That is why it worked for me. :) – Syam S Jul 30 '14 at 16:09
  • I don’t get the first line; what is `List list = new ArrayList(Arrays.asList(new String[]{"a", "b", "c"}));` supposed to do? It doesn’t seem to have any advantage over `List list = Arrays.asList("a", "b", "c");` – Holger Jul 31 '14 at 14:05
  • @Holger: `Arrays.asList()` returns `java.util.Arrays$ArrayList` which doesnt implement all the `AbstractList` methods. Say for example if you try `list.add(2, "d");` would result in `java.lang.UnsupportedOperationException`. – Syam S Jul 31 '14 at 14:13
  • Which piece? The initialization? Its one way of inline initialization of a List. – Syam S Jul 31 '14 at 14:23
  • @Syam S: your answer contains a piece of code providing a problem solution and its first line is unnecessarily complex regarding what this piece of code does. I simply asked why you didn’t just use `List list = Arrays.asList("a", "b", "c");` instead. Apparently you don’t have a reason. – Holger Jul 31 '14 at 14:37
  • Oops.. I just tried to give a complete example. Updated my answer now. :) – Syam S Jul 31 '14 at 14:40