0

When we give some inputs to stream API it is divided it into chunks and JVM create a multiple threads to perform the execution on that Chunk.

Question :-

If i gave million entries ArrayList as input to the parallel pipeline and after the first half of the computation on List by JVM internal thread exception occurs.

How the JVM will handle the ROll Back.

is JVM really rollback to the original state ?

Hasnain Ali Bohra
  • 2,130
  • 2
  • 11
  • 25
  • 5
    it will not revert anything, it is up to you to do that – Eugene Nov 09 '17 at 11:57
  • 1
    Streams are not magic. Exceptions are relayed to the calling code. In parallel computation context, it is represented by completing a ForkJoinTask with exception result, and throwing that exception from the collector later, in whichever thread that would be. JVMs do not handle rollbacks, there are no rollbacks from the standpoint of a JVM. Unhandled exception finishes the application with an error code. – M. Prokhorov Nov 09 '17 at 12:01
  • @M.Prokhorov it would be actually pretty interesting to adapt a Stream to have something like `onError`... – Eugene Nov 09 '17 at 12:03
  • @Eugene, I mean, there is RxJava which is along those lines. I am very inclined to view java Stream class as a simplified RxObservable with only one channel. – M. Prokhorov Nov 09 '17 at 12:05
  • 2
    Yea ... but rollbacks would be unimplementable ... unless the rollback code was implemented by *your* classes. – Stephen C Nov 09 '17 at 12:05
  • Although thinking about it, no, Rx is not at all like Stream. – M. Prokhorov Nov 09 '17 at 12:07
  • 1
    @StephenC, implementing even a retry mechanism in Stream means condensing a meaning of `Throwable` thrown inside stream pipeline to a handful of meanings, which i doubt is possible, really. The metadata required to it would also pollute the pipe declaration, and each stream stage is supposed to be atomic anyway, with each item processed on the same thread (not necessarily the one which submitted the item, but it's beside the point). So a stream retry is a function which invokes another function up to three times or whatever if delegate throws exceptions. – M. Prokhorov Nov 09 '17 at 12:14
  • The above can be implemented as a utility extension for streams, I'm pretty sure `RetryFunction` is already a thing in some function-based library. – M. Prokhorov Nov 09 '17 at 12:16

1 Answers1

5

There is no such thing as a rollback and under normal circumstances, there is no need for that. Stream operations are reading from a source and producing a new result. In case of an exception, there is no new result and any temporary object created during the processing will eventually get garbage collected.

In a perfect world, the terminal operation would wait for the completion of all subtasks before relaying the exception to the caller, but currently, it doesn’t, see also this Q&A. But even if it did, the subtasks continued to process items until either, reaching the end of their workload or detecting that the operation has been aborted, rather than rolling back their previous work.

Note that the documentation explicitly discourages from using functions with stateful behavior, as the results may be nondeterministic or incorrect when using a parallel stream. Even without exceptions, a parallel stream may process elements which do not contribute to the final result when performing a short-circuiting operation. In either case, those side effects produced by a function can’t be rolled back.

It must be emphasized that this also applies to the legal use of side effects, i.e. with peek or forEach, whose actions will not be undone in the exceptional case. If you use peek for the intended purpose, it’s not an issue, as reporting that the element has been processed is still correct, even if the result is dropped due to a subsequent exception. If this is an issue for your action passed to forEach, as you don’t want them to take place in the exceptional case, there is no way around collecting the elements first, e.g. via toArray or collect(toList()), and doing a forEach on the result after the stream operation’s normal completion.

Of course, this is not necessary if the action only modifies the state of something that has its own rollback mechanism, like sending each element to a database.

For some cases, the streams reading operation does modify the state of the source, e.g. when reading numbers from a random number generator, lines from a BufferedReader, or tokens from a Scanner(Java 9). In these cases, the operation also has an impact on the source that cannot be undone.

In case of BufferedReader.lines() and Scanner.tokens(), the documentation explicitly states that the reader/scanner is in an unspecified state after the operation, even in the non-exceptional case, and Random number generators are usually treated like producing unpredictable numbers anyway. So for none of these cases does the absence of a rollback cause an issue.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • seems like lots of people understood this (at least 4 up-votes, one is mine), but I still feel an idiot as I can not understand this: *For some cases, the streams reading operation does modify the state of the source*. how would reading a source would modify it? – Eugene Nov 13 '17 at 21:52
  • I will assume that reading a file might move the pointer (cursor?) and generating random numbers would somehow mess with the `PRNG` used underneath - but no clue – Eugene Nov 13 '17 at 21:54
  • @Eugene: reading from a file does not modify the file, but reading from a `BufferedReader` does modify the `BufferedReader`, so there’s a significant difference between calling `BufferedReader.lines()`, where you have to be aware that this makes the `BufferedReader` unusable, and `Files.lines()`, which you can invoke multiple times. And yes, reading the next value from a PRNG modifies the state of the PRNG. – Holger Nov 14 '17 at 07:21