5

While going through articles of sequential streams the question came in my mind that are there any performance benefits of using sequential streams over traditional for loops or streams are just sequential syntactic sugar with an additional performance overhead?

Consider Below Example where I can not see any performance benefits of using sequential streams:

Stream.of("d2", "a2", "b1", "b3", "c")
    .filter(s -> {
        System.out.println("filter: " + s);
        return s.startsWith("a");
})
    .forEach(s -> System.out.println("forEach: " + s));

Using classic java:

String[] strings = {"d2", "a2", "b1", "b3", "c"};
        for (String s : strings)
        {
            System.out.println("Before filtering: " + s);
            if (s.startsWith("a"))
            {
                System.out.println("After Filtering: " + s);
            }
        }

Point Here is in streams processing of a2 starts only after all the operations on d2 is complete(Earlier I thought while d2 is being processed by foreach ,filter would have strated operating on a2 but that is not the case as per this article : https://winterbe.com/posts/2014/07/31/java8-stream-tutorial-examples/), same is the case with classic java, so what should be the motivation of using streams beyond "expressive" and "elegant" coding style?I know there are performance overheads for compiler while handling streams, does anyone know/have experienced about any performance benefits while using sequential streams?

Naman
  • 27,789
  • 26
  • 218
  • 353
Hitesh
  • 432
  • 3
  • 13

4 Answers4

4

First of all, letting special cases, like omitting a redundant sorted operation or returning the known size on count(), aside, the time complexity of an operation usually doesn’t change, so all differences in execution timing are usually about a constant offset or a (rather small) factor, not fundamental changes.


You can always write a manual loop doing basically the same as the Stream implementation does internally. So, internal optimizations, as mentioned by this answer could always get dismissed with “but I could do the same in my loop”.

But… when we compare “the Stream” with “a loop”, is it really reasonable to assume that all manual loops are written in the most efficient manner for the particular use case? A particular Stream implementation will apply its optimizations to all use cases where applicable, regardless of the experience level of the calling code’s author. I’ve already seen loops missing the opportunity to short-circuit or performing redundant operations not needed for a particular use case.

Another aspect is the information needed to perform certain optimizations. The Stream API is built around the Spliterator interface which can provide characteristics of the source data, e.g. it allows to find out whether the data has a meaningful order needed to be retained for certain operations or whether it is already pre-sorted, to the natural order or with a particular comparator. It may also provide the expected number of elements, as an estimate or exact, when predictable.

A method receiving an arbitrary Collection, to implement an algorithm with an ordinary loop, would have a hard time to find out, whether there are such characteristics. A List implies a meaningful order, whereas a Set usually does not, unless it’s a SortedSet or a LinkedHashSet, whereas the latter is a particular implementation class, rather than an interface. So testing against all known constellations may still miss 3rd party implementations with special contracts not expressible by a predefined interface.

Of course, since Java 8, you could acquire a Spliterator yourself, to examine these characteristics, but that would change your loop solution to a non-trivial thing and also imply repeating the work already done with the Stream API.


There’s also another interesting difference between Spliterator based Stream solutions and conventional loops, using an Iterator when iterating over something other than an array. The pattern is to invoke hasNext on the iterator, followed by next, unless hasNext returned false. But the contract of Iterator does not mandate this pattern. A caller may invoke next without hasNext, even multiple times, when it is known to succeed (e.g. you do already know the collection’s size). Also, a caller may invoke hasNext multiple times without next in case the caller did not remember the result of the previous call.

As a consequence, Iterator implementations have to perform redundant operations, e.g. the loop condition is effectively checked twice, once in hasNext, to return a boolean, and once in next, to throw a NoSuchElementException when not fulfilled. Often, the hasNext has to perform the actual traversal operation and store the result into the Iterator instance, to ensure that the result stays valid until the subsequent next call. The next operation in turn, has to check whether such a traversal did already happen or whether it has to perform the operation itself. In practice, the hot spot optimizer may or may not eliminate the overhead imposed by the Iterator design.

In contrast, the Spliterator has a single traversal method, boolean tryAdvance(Consumer<? super T> action), which performs the actual operation and returns whether there was an element. This simplifies the loop logic significantly. There’s even the void forEachRemaining(Consumer<? super T> action) for non-short-circuiting operations, which allows the actual implementation to provide the entire looping logic. E.g., in case of ArrayList the operation will end up at a simple counting loop over the indices, performing a plain array access.

You may compare such design with, e.g. readLine() of BufferedReader, which performs the operation and returns null after the last element, or find() of a regex Matcher, which performs the search, updates the matcher’s state and returns the success state.

But the impact of such design differences is hard to predict in an environment with an optimizer designed specifically to identify and eliminate redundant operations. The takeaway is that there is some potential for Stream based solutions to turn out to be even faster, while it depends on a lot of factors whether it will ever materialize in a particular scenario. As said at the beginning, it’s usually not changing the overall time complexity, which would be more important to worry about.

Community
  • 1
  • 1
Holger
  • 285,553
  • 42
  • 434
  • 765
1

Streams might (and have some tricks already) under the hood, that a traditional for-loop does not. For example:

Arrays.asList(1,2,3)
      .map(x -> x + 1)
      .count();

Since java-9, map will be skipped, since you don't really care about it.

Or internal implementation might check if a certain data structure is already sorted, for example:

someSource.stream()
          .sorted()
          ....

If someSource is already sorted (like a TreeSet), in such a case sorted would be a no-op. There are many of these optimizations that are done internally and there is ground for even more that may be will be done in the future.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • such optimizations are also achievable programattically : for example (!(o instanceOf TreeSet)) then do... we can say that by doing this intelligently they reduce the number of lines of codes for us to write , but are there any performance benefits which we can not achieve by classic coding style? – Hitesh Jan 17 '19 at 15:31
  • @Hitesh of course they are, but there are many more which will be not that easy to track – Eugene Jan 17 '19 at 15:32
0

If you were to use streams still, you could have created a stream out of your array using Arrays.stream and used a forEach as:

Arrays.stream(strings).forEach(s -> {
    System.out.println("Before filtering: " + s);
    if (s.startsWith("a")) {
        System.out.println("After Filtering: " + s);
    }
});

On the performance note, since you would be willing to traverse the entire array, there is no specific benefit from using streams over loops. More about it has been discussed In Java, what are the advantages of streams over loops? and other linked questions.

Naman
  • 27,789
  • 26
  • 218
  • 353
0

enter image description hereIf using stream, we can use with parallel(), as bellow:

Stream<String> stringStream = Stream.of("d2", "a2", "b1", "b3", "c")
            .parallel()
            .filter(s -> s.startsWith("d"));

It's faster because your computer will normally be able to run more than one thread together.

Test it's:

@Test
public void forEachVsStreamVsParallelStream_Test() {
    IntStream range = IntStream.range(Integer.MIN_VALUE, Integer.MAX_VALUE);
    StopWatch stopWatch = new StopWatch();

    stopWatch.start("for each");
    int forEachResult = 0;
    for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
        if (i % 15 == 0)
            forEachResult++;
    }
    stopWatch.stop();


    stopWatch.start("stream");
    long streamResult = range
            .filter(v -> (v % 15 == 0))
            .count();
    stopWatch.stop();


    range = IntStream.range(Integer.MIN_VALUE, Integer.MAX_VALUE);
    stopWatch.start("parallel stream");
    long parallelStreamResult = range
            .parallel()
            .filter(v -> (v % 15 == 0))
            .count();
    stopWatch.stop();

    System.out.println(String.format("forEachResult: %s%s" +
                    "parallelStreamResult: %s%s" +
                    "streamResult: %s%s",
            forEachResult, System.lineSeparator(),
            parallelStreamResult, System.lineSeparator(),
            streamResult, System.lineSeparator()));

    System.out.println("prettyPrint: " + stopWatch.prettyPrint());
    System.out.println("Time Elapsed: " + stopWatch.getTotalTimeSeconds());
}
Reuven
  • 1
  • 2