-1

I was told that Java Stream is a good choice for processing a big mount of data and I did a comparison test recently. However the test result is unexpected:

The question is from CodeWar:

Suppose there is a metro with 100 people initially, on each stop there are several people get in the metro and several people get out. The target is to count the number of the people remained in the metro after a big number of stops (100000).

Here is my code:

import java.util.ArrayList;

public class Metro1 {
    private final static int STOPS = 100000;
    private static ArrayList<int[]> metro = new ArrayList<int[]>();

    public static int sum1() {
        int sum = 0;
        for(int[] x: metro) {
            sum +=x[0] - x[1];
        }
        return sum;
    }

    public static int sum2() {
        return metro.stream()
                .mapToInt(x -> x[0]-x[1])
                .sum();
    }
    public static void main(String[] args) {
        long start=0;
        long end = 0;
        metro.add(new int[] {100,0});
        for(int i=1;i<STOPS;i++) {
            int in = (int) Math.round(Math.random() * 10);
            int out = (int) Math.round(Math.random() * 10);
            metro.add(new int[] {in,out});
        }
        System.out.println("Stops: " + metro.size());

        start = System.currentTimeMillis();
        System.out.println("sum1: " + sum1());
        end = System.currentTimeMillis();
        System.out.println("sum1 (for loop): " + String.valueOf(end-start) + " milliseconds.");


        start = System.currentTimeMillis();
        System.out.println("sum2: " + sum2());
        end = System.currentTimeMillis();
        System.out.println("sum1 (stream): " + String.valueOf(end-start) + " milliseconds.");

    }

}

I ran the code in Eclipse and I found that sum1 is much more faster than sum2:

Stops: 100000
sum1: 79
sum1 (for loop): 6 milliseconds.
sum2: 79
sum1 (stream): 68 milliseconds.

I thought the code is simple enough, but why the stream is slower than for loop?

Thanks,

Alex

Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
Alex Hou
  • 15
  • 1
  • 5
  • 1
    Streams are slower than for loops, simply because they do more work. The advantage is principally readability (and parallelizability; but parallel streams are useful far less often than the puff around streams would have you believe). – Andy Turner Apr 19 '20 at 14:36
  • 1
    [Java performance tutorial – How fast are the Java 8 streams?](https://jaxenter.com/java-performance-tutorial-how-fast-are-the-java-8-streams-118830.html) [Java 8’s streams are slow and why it doesn’t matter](https://danonrockstar.com/java-8s-streams-are-slow-and-why-it-doesn-t-matter-eb0bfdcbfbd3). You have got a search engine, haven’t you? – Ole V.V. Apr 19 '20 at 15:01
  • Does this answer your question? [How do I write a correct micro-benchmark in Java?](https://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java) – Ole V.V. Apr 19 '20 at 15:03

2 Answers2

4

Just like regexes where a specific homegrown parser can be faster, streams are there to provide a quick and concise means of processing data. One of the advantages of streams is that intermediate data structures can be minimized while processing the elements. The other is the parallel aspect that has already been mentioned in the comments.

But as to your example.

  • Relying on simply performance tests using internal clocks (even though I do it too) is not the best way to accurately assess performance. Use something like Java Microbench Harness to do testing.
  • As to your result, try it with the following:

    • Change STOPS to 100_000_000
    • Modify your stream to
      return metro.stream().parallel() .mapToInt(x -> x[0]-x[1]) .sum();

Here were the results on my Windows, quad core i7 laptop

Stops: 100000000
sum1: -7073
sum1 (for loop): 908 milliseconds.
sum2: -7073
sum1 (stream): 518 milliseconds.

WJS
  • 36,363
  • 4
  • 24
  • 39
  • Thanks. However I could not make the stream works faster even I used parallel(), actually I did try parallel before posting this question. My laptop is a dual-core mac. – Alex Hou Apr 20 '20 at 15:32
  • 2
    I double checked my code and I found a bug in it. Now the stream with parallel() works much more faster, almost the same to the for-loop. Thanks again for your detailed information! – Alex Hou Apr 21 '20 at 14:20
0

The streaming API are more of for code readability and maintainability. For loops may provide better performance, but sometimes performance may not be the important metric to measure. Having maintainable code is equally important.

We can use parallel streams with very big dataset, but again the better performance is not guaranteed, as in involves other overheads and depends on available resources.

Rahul Jain
  • 142
  • 2
  • 11