-2
stream.parallel().skip(1) 

vs

stream.skip(1).parallel() 

This is about Java 8 streams.
Are both of these skipping the 1st line/entry?

The example is something like this:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.concurrent.atomic.AtomicLong;

public class Test010 {

    public static void main(String[] args) {
        String message = 
        "a,b,c\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n1,2,3\n4,5,6\n7,8,9\n";

        try(BufferedReader br = new BufferedReader(new StringReader(message))){

            AtomicLong cnt = new AtomicLong(1);

            br.lines().parallel().skip(1).forEach(
                s -> {
                    System.out.println(cnt.getAndIncrement() + "->" + s);
                }
            );

        }catch (IOException e) {
            e.printStackTrace();
        }

    }

}

Earlier today, I was sometimes getting the header line "a,b,c" in the lambda expression. This was a surprise since I was expecting to have skipped it already. Now I cannot get that example to work i.e. I cannot get the header line in the lambda expression. So I am pretty confused now, maybe something else was influencing that behavior. Of course this is just an example. In the real world the message is being read from a CSV file. The message is the full content of that CSV file.

peter.petrov
  • 38,363
  • 16
  • 94
  • 159
  • 2
    Did you try both of them to find out? – Andrew Jun 28 '16 at 15:33
  • 4
    The answer to your question is yes. Read the [API note](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#skip-long-) – 4castle Jun 28 '16 at 15:35
  • I tried the 1st option and it seems to me (I am pretty sure actually) that I am not skipping the 1st line/entry. I am just skipping some random entry. – peter.petrov Jun 28 '16 at 15:41
  • Perhaps you are stumbling upon [“Stream.skip behavior with unordered terminal operation”](http://stackoverflow.com/q/30843279/2711488). Check which exact Java version you are using. – Holger Jun 28 '16 at 17:01
  • @Andrew While trying them both could prove them to be different, it is useless for determining if they _must_ be the same, which is really what the OP is asking. He doesn't mean "might it sometimes produce the same result", but "must it produce the same result." And trying it out doesn't really help with that; you have to appeal to the _specification_. – Brian Goetz Jun 28 '16 at 19:15
  • @BrianGoetz Yes, that's basically what I meant. I posted an example but now for some reason I cannot get the same behavior I saw earlier today. – peter.petrov Jun 28 '16 at 20:23
  • Side note after reading Holger's answer: I think my question is just fine :) not sure why it got quickly downvoted. – peter.petrov Jun 29 '16 at 15:07

2 Answers2

4

You actually have two questions in one, the first being whether it makes a difference in writing stream.parallel().skip(1) or stream.skip(1).parallel(), the second being whether either or both will always skip the first element. See also “loaded question”.

The first answer is that it makes no difference, because specifying a .sequential() or .parallel() execution policy affects the entire Stream pipeline, regardless of where you place it in the call chain—of course, unless you specify multiple contradicting policies, in which case the last one wins.

So in either case you are requesting a parallel execution which might affect the outcome of the skip operation, which is subject of the second question.

The answer is not that simple. If the Stream has no defined encounter order in the first place, an arbitrary element might get skipped, which is a consequence of the fact that there is no “first” element, even if there might be an element you encounter first when iterating over the source.

If you have an ordered Stream, skip(1) should skip the first element, but this has been laid down only recently. As discussed in “Stream.skip behavior with unordered terminal operation”, chaining an unordered terminal operation had an effect on the skip operation in earlier implementations and there was some uncertainty of whether this could even be intentional, as visible in “Is this a bug in Files.lines(), or am I misunderstanding something about parallel streams?”, which happens to be close to your code; apparently skipping the first line is a common case.

The final word is that the behavior of earlier JREs is a bug and skip(1) on an ordered stream should skip the first element, even when the stream pipeline is executed in parallel and the terminal operation is unordered. The associated bug report names jdk1.8.0_60 as first fixed version, which I could verify. So if you are using on older implementation, you might experience the Stream skipping different elements when using .parallel() and the unordered .forEach(…) terminal operation. It’s not contradicting if the implementation occasionally skips the expected element, that’s the unpredictability of multi-threading.

So the answer still is that stream.parallel().skip(1) and stream.skip(1).parallel() have the same behavior, even when being used in earlier versions, as both are equally unpredictable when being used with an unordered terminal operation like forEach. They should always skip the first element with ordered Streams and when being used with 1.8.0_60 or newer, they do.

Community
  • 1
  • 1
Holger
  • 285,553
  • 42
  • 434
  • 765
  • Thanks. So ... is my stream `br.lines()` ordered (the way I create it)? I used 1.8.0_25 btw when I noticed that behavior. What about the case: ordered stream and JVM earlier than 1.8.0_60? – peter.petrov Jun 29 '16 at 14:29
  • Yes, `BufferedReader.lines()` returns an ordered Stream. That’s why I said, it *should* work. Still, doing a single `readLine()` call on the `BufferedReader` *before* invoking `lines()` saves you from the costs of a (parallel) `skip` operation and works in all versions. As said, in JREs earlier than 1.8.0_60, you get surprising behavior if the *terminal operation* is unordered, which is the case with `forEach`. – Holger Jun 29 '16 at 14:34
  • Ah! Good idea. I didn't think that's possible (to call `readLine` and then to create the stream). I will try all these options. Thanks again for the detailed response. – peter.petrov Jun 29 '16 at 14:36
3

Yes, but skip(n) is slower as n is larger with a parallel stream.

Here's the API note from skip():

While skip() is generally a cheap operation on sequential stream pipelines, it can be quite expensive on ordered parallel pipelines, especially for large values of n, since skip(n) is constrained to skip not just any n elements, but the first n elements in the encounter order. Using an unordered stream source (such as generate(Supplier)) or removing the ordering constraint with BaseStream.unordered() may result in significant speedups of skip() in parallel pipelines, if the semantics of your situation permit. If consistency with encounter order is required, and you are experiencing poor performance or memory utilization with skip() in parallel pipelines, switching to sequential execution with BaseStream.sequential() may improve performance.

So essentially, if you want better performance with skip(), don't use a parellel stream, or use an unordered stream.


As for it seeming to not work with parallel streams, perhaps you're actually seeing that the elements are no longer ordered? For example, an output of this code:

Stream.of("Hello", "How", "Are", "You?")
    .parallel()
    .skip(1)
    .forEach(System.out::println);

Is

Are
You?
How

Ideone Demo

This is perfectly fine because forEach doesn't enforce the encounter order in a parallel stream. If you want it to enforce the encounter order, use a sequential stream (and perhaps use forEachOrdered so that your intent is obvious).

Stream.of("Hello", "How", "Are", "You?")
    .skip(1)
    .forEachOrdered(System.out::println);

How
Are
You?

4castle
  • 32,613
  • 11
  • 69
  • 106
  • Thanks but I don't quite care about performance. I care if I am skipping the 1st entry or some random one. – peter.petrov Jun 28 '16 at 15:55
  • 1
    @peter.petrov It's not skipping a random one. It still skips the same entries. Could you provide an MCVE of the problem? – 4castle Jun 28 '16 at 15:56
  • It is, your example is very limited. This question is about encounter order. How do I skip the first one in encounter order?! I think I found the solution, thanks. – peter.petrov Jun 28 '16 at 16:05
  • 1
    @peter.petrov Can you please share the solution? I'm confused about what's making you think it's not skipping the first one. – 4castle Jun 28 '16 at 16:19
  • My test and the printouts I added are making me think so. The solution is `forEachOrdered`. I may post an answer myself these days. Thanks for the help anyway. – peter.petrov Jun 28 '16 at 16:31
  • @peter.petrov Using `forEachOrdered` sacrifices the benefit of a parallel stream. (this is documented in [`forEach`](https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#forEach-java.util.function.Consumer-)) Just use a sequential stream. – 4castle Jun 28 '16 at 17:10
  • @peter.petrov: recheck whether you are still talking about the same question. Your actual question was, whether it makes a difference if you write `.parallel().skip(1)` or `.skip(1).parallel()`, not how to enforce encounter order. – Holger Jun 28 '16 at 17:14
  • @Holger I posted an example. – peter.petrov Jun 28 '16 at 20:26
  • @peter.petrov Looking at your example, it seems you're fighting parallel streams and not actually reaping any rewards. You should switch to a sequential stream. – 4castle Jun 28 '16 at 20:29
  • @4castle I am getting some reward from parallelism since the data is sent in parallel, in the real world that's not a `System.out.println` call there but some send operation. And also... that's not the point here. – peter.petrov Jun 28 '16 at 20:31
  • @peter.petrov Okay, that sounds good then as long as the order doesn't matter when you're sending. Glad you got it working! – 4castle Jun 28 '16 at 20:37
  • @4castle Well, I would be happier if I got it non-working :) I don't understand anything now :) Thanks anyway! – peter.petrov Jun 28 '16 at 20:39