4

I'm creating a list and fill it from an other list using parallel stream, unexpectedly destination list contains nulls. It happens seldom and inconstantly. Does someone have the same issue?

Here is the piece of code:

Collection<DestinationObj> DestinationObjList = Lists.newArrayList();
SourceObjList.parallelStream().forEach(portalRule -> DestinationObjList.add(new DestinationObj(portalRule)));
return DestinationObjList;
Pavel Leonov
  • 346
  • 1
  • 11
  • 4
    Your `ArrayList` is not thread-safe. – Flown Apr 05 '17 at 11:54
  • 4
    https://docs.oracle.com/javase/tutorial/collections/streams/parallelism.html see the end of the page ;) –  Apr 05 '17 at 11:55
  • 2
    By the way you should use `.collect(...)` rather than doing `.forEach`. I suspect that it might work for parallel stream even with not thread-safe list (as the collector would be handling the synchronisation), but you will have to verify it. – Jaroslaw Pawlak Apr 05 '17 at 11:59

3 Answers3

3

You should collect in parallel in a bit different way:

   SourceObjList.parallelStream()
        .map(DestinationObj::new)
        .collect(Collectors.toCollection(ArrayList::new));

The problem you are having is that ArrayList is not thread-safe and as such the result is really un-defined. Notice that using a parallel stream does not require a thread-safe collection - Lists::newArrayList is not.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • 1
    By the way, I would not omit the `DestinationObjList = ` assignment of the result, as the pattern differs from the OP’s `forEach` approach fundamentally, so we should not treat it as implied. – Holger Apr 05 '17 at 12:22
2

Using a collector to synchronize access to the destination list gives you a performance penalty in the synchronization. In fact, you can do the same thing without synchronization, since you know the size of the source list and can therefore create a destination list of the required size from the start.

DestinationObj[] dest = new DestinationObj[sourceObjList.size()];
IntStream.range(0, sourceObjList.size())
    .parallel()
    .forEach(i -> dest[i] = new DestinationObj(sourceObjList.get(i)));
List<DestinationObj> destinationObjList = Arrays.asList(dest);

EDIT: Just putting Holger's improvement here for clarity:

List<DestinationObj> destinationObjList = Arrays.asList(
        sourceObjList
            .parallelStream()
            .map(DestinationObj::new)
            .toArray(DestinationObj[]::new));
Klitos Kyriacou
  • 10,634
  • 2
  • 38
  • 70
  • 3
    The array approach might be slightly more efficient than a collector in this case, but this doesn’t imply that you have to do it by hand, `DestinationObj[] dest = sourceObjList.parallelStream() .map(DestinationObj::new).toArray(DestinationObj[]::new)` will do the same, also exploiting the known result size, but still continuing to work, if you change the operation in a way that the size is not predictable anymore. It also allows to do it without a temporary variable: `List l=Arrays.asList(sourceObjList.parallelStream() .map(DestinationObj::new).toArray(DestinationObj[]::new));` – Holger Apr 05 '17 at 12:46
  • Good point, @Holger. Is `.toArray` guaranteed not to add synchronization in cases such as this, where the source stream has known size? It should also be possible to avoid synchronization when using `collect(Collectors.toCollection(ArrayList::new))` as in Eugene's answer. It can create an appropriately sized ArrayList and use `add(index, element)` thus removing the need for synchronization. In which case, my answer is pointless. – Klitos Kyriacou Apr 05 '17 at 13:54
  • 2
    A `Collector` doesn’t work via synchronization. It works by using the [`Supplier`](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html#supplier--) to create thread-local containers to perform isolated partial evaluation. In this specific example, it means that each thread will collect a part of the data into its own `ArrayList`, then, the partial results have to be merged via the [combiner `Function`](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Collector.html#combiner--), which is equivalent to `addAll`… – Holger Apr 05 '17 at 14:10
  • 3
    So, the `Supplier` doesn’t know the size in advance and in case of `toCollection(ArrayList::new)`, the merging is as expensive as the gain of collecting partial results in parallel. `toArray` clearly is more efficient here. – Holger Apr 05 '17 at 14:11
0

There is a concurrent list implementation in java.util.concurrent. CopyOnWriteArrayList in particular.

-- Jarrod Roberson

Look here: Is there a concurrent List in Java's JDK?

Community
  • 1
  • 1
Sergey
  • 176
  • 2
  • 12
  • 4
    Filling a `CopyOnWriteArrayList` this way, is the one of the most inefficient ways to get a `List` possible… – Holger Apr 05 '17 at 12:53