15

Does the specification guarantee, that all operations on sequential Java Streams are executed in the current thread? (Except for "forEach" and "forEachOrdered")

I explicitly ask for the specification, not what the current implementation does. I can look into the current implementation myself and don't need to bother you with that. But the implementation might change and there are other implementations.

I'm asking because of ThreadLocals: I use a Framework which uses ThreadLocals internally. Even a simple call like company.getName() eventually uses a ThreadLocal. I cannot change how that framework is designed. At least not within a sane amount of time.

The specification seems confusing here. The documentation of the Package "java.util.stream" states:

If the behavioral parameters do have side-effects, unless explicitly stated, there are no guarantees as to the visibility of those side-effects to other threads, nor are there any guarantees that different operations on the "same" element within the same stream pipeline are executed in the same thread.

...

Even when a pipeline is constrained to produce a result that is consistent with the encounter order of the stream source (for example, IntStream.range(0,5).parallel().map(x -> x*2).toArray() must produce [0, 2, 4, 6, 8]), no guarantees are made as to the order in which the mapper function is applied to individual elements, or in what thread any behavioral parameter is executed for a given element.

I would interpret that as: Every operation on a stream can happen in a different thread. But the documentation of "forEach" and "forEachOrdered" explicitly states:

For any given element, the action may be performed at whatever time and in whatever thread the library chooses.

That statement would be redundant if every stream operation could happen in an unspecified thread. Is therefore the opposite true: All operations on a serial stream are guaranteed to be executed in the current thread, except for "forEach" and "forEachOrdered"?

I have googled for an authoritative answer about the combination of "Java", "Stream" and "ThreadLocal" but found nothing. The closes thing was an answer by Brian Goetz to a related question here on Stack Overflow, but it is about the order, not the thread, and it is only about "forEach", not the other stream methods: Does Stream.forEach respect the encounter order of sequential streams?

Community
  • 1
  • 1
user194860
  • 626
  • 5
  • 12
  • that `forEach` documentation has right at the beginning *For parallel stream pipelines...*, this only concerns *parallel* processing; even if there are two sentences – Eugene May 23 '18 at 14:56
  • 3
    About the "For parallel stream pipelines...": In the other question i have linked, Brian Goetz states that the restriction "For parallel stream pipelines" only applies to this one sentence. The sentence that I quoted is not restricted by it. (If I understand him correctly.) – user194860 May 23 '18 at 15:11
  • you're right; even worse (for me) is that at the end he states *the spec is still perfectly clear with that sentence removed entirely*.. So if we remove it, what you are saying about `forEach` is perfectly sane. – Eugene May 23 '18 at 15:12
  • it would have been great if the spec had: "the thread calling the terminal operation for sequential streams is the one doing the intermediate operations" or something the like; unless their intention (and some potential future implementation) is entirely different and this was left out on purpose. good question – Eugene May 23 '18 at 15:15
  • 2
    Excellent question indeed - note that this sentence has also been added to the [JavaDocs of iterate in java 9](https://docs.oracle.com/javase/9/docs/api/java/util/stream/Stream.html#iterate-T-java.util.function.UnaryOperator-), which did not contain it in [Java 8](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#iterate-T-java.util.function.UnaryOperator-) – Hulk May 24 '18 at 08:03
  • 1
    Related: https://stackoverflow.com/q/45871618/2513200 (perhaps even duplicate), but I'll admit that the only answer there doesn't really convince me – Hulk May 24 '18 at 08:45
  • 4
    @Hulk I think it's only up to Stuart Marks or Brian Goetz to answer this, eagerly waiting... – Eugene May 24 '18 at 11:55

1 Answers1

1

I believe the answer you are looking for is not so well defined, as it will depends on the consumer and/or spliterator and their characteristics:

Before reading the main quote:

https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html#stream

default Stream stream() Returns a sequential Stream with this collection as its source. This method should be overridden when the spliterator() method cannot return a spliterator that is IMMUTABLE, CONCURRENT, or late-binding. (See spliterator() for details.)

https://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html#binding

Despite their obvious utility in parallel algorithms, spliterators are not expected to be thread-safe; instead, implementations of parallel algorithms using spliterators should ensure that the spliterator is only used by one thread at a time. This is generally easy to attain via serial thread-confinement, which often is a natural consequence of typical parallel algorithms that work by recursive decomposition. A thread calling trySplit() may hand over the returned Spliterator to another thread, which in turn may traverse or further split that Spliterator. The behaviour of splitting and traversal is undefined if two or more threads operate concurrently on the same spliterator. If the original thread hands a spliterator off to another thread for processing, it is best if that handoff occurs before any elements are consumed with tryAdvance(), as certain guarantees (such as the accuracy of estimateSize() for SIZED spliterators) are only valid before traversal has begun.

Spliterators and consumers have their on set of characteristics, and that will define the guarantee. Let's suppose you are operating in a streem. As spliterators are supposed not to be thread safe and supposed to handle elements to other spliterators that might be in other thread, been sequencial or not, there guarantee is null. However, if no splits occor the quotes will lead to the following: under one spliterator, the operations will remain in the same thread, any event that leads to a split will cause the assumption to be null, but true otherwise

Victor
  • 3,520
  • 3
  • 38
  • 58
  • 9
    how does this answers the question? – Eugene May 23 '18 at 19:48
  • Spliterators and consumers have their on set of characteristics, and that will define the guarantee. Let's suppose you are operating in a streem. As spliterators are supposed not to be thread safe and supposed to handle elements to other spliterators that might be in other thread, been sequencial or not, there guarantee is null. However, if no splits occor the quotes will lead to the following: under one spliterator, the operations will remain in the same thread, any event that leads to a split will cause the assumption to be null, but true otherwise. – Victor May 24 '18 at 13:05
  • thanks, I will look further at this to try explain better when I get a break at work. For now I just copied my comment there. – Victor May 24 '18 at 13:35
  • I guess you could enforce all elements to be processed by a single thread by building your stream on an iterator that refuses to split (i.e. always returns `null` from `trySplit`), but I still don't see a reason why this single thread could not be some other thread than the one that invokes the terminal operation. – Hulk May 24 '18 at 13:54
  • Yes, I suppose you are correct, and this is the point where there is no more garantee coming from the docs. As the terminal operation is not restricted to handle the spliterator, with all processing responsabilidade to a secondary thread or reserving data from a secondary thread. I would imagine this would be a waist of resources, as the program flow would have to wait for computation. Unless the computation involves something like futures. So my conclusion would be a no, there is no guarantee with regards to the terminal operation. – Victor May 24 '18 at 15:58
  • 2
    @Hulk even then, it would be valid if `tryAdvance` is invoked on the same instance by different threads which coordinate their access, in other words, establish a *happens-before* relationship between these two invocations. Only *concurrent* access to the same spliterator is forbidden. – Holger May 24 '18 at 16:00
  • But in the other hand, while we have elements being consumed sequencially and in a lazy production fashion, we could infeer that the consumer is using the same thread so the element is consumed before the next been produced. But that does not mean that the thread cannot fork and wait while feeding the consumer with values, multiple threads would even be possible if the consumers is unordered and concurrent. But this is far as the docs goes, I believe. – Victor May 24 '18 at 16:05