5

I read a few posts, however, I'm still confused.

I know that parallel streams will be executed in a parallel manner that will utilise the CPUs. and I believe that the sub jobs will be executed as atomic units, am I correct?

But what about regular Java 8 streams?

If i execute let's say the next line of code:

users.stream().map(user->user.getUsername()).collect(Collectors.toList()); 

Will that line be executed in a thread-safe/atomic manner as well?

Moshe Arad
  • 3,587
  • 4
  • 18
  • 33
  • 2
    No, sub-jobs are not executed as atomic units, but it's not quite clear what you mean by that. Sequential streams are executed in a single thread, therefore they don't have to be thread-safe. – Marko Topolnik Oct 05 '16 at 18:16
  • So, if I have a Multi-thread system, and I'm using sequential streams, I may get data inconsistency due to a race condition? – Moshe Arad Oct 05 '16 at 19:40
  • 1
    Whether you have a race condition will have nothing to do with the Streams processing you do. As I said, it is very unclear what you have in mind – Marko Topolnik Oct 05 '16 at 19:42
  • I'm using Java8 Streams in a Multi-thread environment, I just want to know if a may get to a race condition situation because of the use of `users.stream()` or `users.parallelStream()` . if one of these statements will resolve to a data inconsistency, I want to know how to avoid it. – Moshe Arad Oct 05 '16 at 19:59
  • You are as unclear as before. You can get into a race condition whether or not you use streams and whether or not they are parallel. – Marko Topolnik Oct 05 '16 at 20:03
  • 2
    Streams perform read operations. You run into problems when someone writes concurrently and you're not taking measures. As Marko Topolnik already pointed out, this is always the case, regardless of whether you use streams. – Holger Oct 05 '16 at 20:08
  • So, it doesn't matter if I'll use `users.stream()` or `users.parallelStream()` I just need to make sure that internal operation like `user.getUsername()` will not do write operation (In order to get a thread-safe line of code). Is that correct? – Moshe Arad Oct 05 '16 at 20:39
  • 2
    If the method `getUsername` is free of side effects, the stream operation will not introduce any side effects. However, to consider it thread safe, you have to ensure that *no one* is modifying the source list in *any* thread while the operation is in progress. You can only write a thread safe *program*, which is the entirety of *all* operations, not a thread safe operation. If your program is well-formed, the stream operation will be fine as well. – Holger Oct 06 '16 at 17:15

3 Answers3

3

There is no such thing as general thread safety or atomicity. Atomic field updates are only atomic in respect to threads accessing the same variable, synchronized code blocks are executed atomic/thread safe in respect to threads synchronizing on the same instance only.

A stream operation in itself is a purely local operation, this holds even for parallel stream operations, as the threads participating in that operation are unrelated to any other threads. If you use functions with (non local) side effects, which is very discouraged, there are no guarantees, there's no added thread safety nor atomicity for these side effects. The only exception are the terminal operations forEach and forEachOrdered, which are intended for producing side effects and well documented regarding the behavior with multiple threads.

So the operation users.stream().map(user->user.getUsername()).collect(Collectors.toList()), assuming that the method getUsername() follows the contract and has no side effects, is not visible to any other thread at all. If you publish the returned list to other threads in a thread safe way, it will be safe, if you let it escape in an unsafe way, there will be no guarantees. If you never publish the result to other threads, the question becomes irrelevant.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • And if `users` is an instance variable - List which is being modified by other threads during the processing of the stream? – Ekaterina Nov 07 '21 at 19:09
  • 2
    @Ekaterina I think, it’s clear that if `users` is potentially modified by other threads, it’s your duty to ensure that these modifications do not interfere with the Stream operation. The rules are the same as for any other operation that iterates over `users`. – Holger Nov 08 '21 at 08:30
1

In general no. If the Spliterator used has the CONCURRENT characteristic, then the stream is thread-safe.

Kayaman
  • 72,141
  • 5
  • 83
  • 121
1

The stream API defines numerous contracts for each step of the pipeline, if any of them are violated then unpredictable behavior or exception may happen.

  • Spliterator. note the late-binding, IMMUTABLE and CONCURRENT properties, which can differ for various sources.
  • the collection sources usually specify the nature of their spliterators, e.g. ConcurrentHashMap reports CONCURRENT for its views, while HashMap does not. Which means hashmap cannot handle modification by outside threads or interfering side-effects within the stream pipeline.
  • each operation in a pipeline defines which properties the user-supplied methods should have, the key concepts are non-interference, statefulness and side-effects. Compare filter and peek.
  • Collector/Collectors don't spell out their requirements quite as clearly, but the supplied functions should generally be non-interfering, side-effect-free and stateless just like intermediate ops, especially when concurrent collectors are used.

In general, if you do everything right then both parallel and sequential streams are safe to use, even on collections that do not support concurrent modification.

If you're doing things that violate these requirements then sequential streams may be a little more forgiving, but they may still fail.

the8472
  • 40,999
  • 5
  • 70
  • 122