3

What's faster?

List<E> bar = new ArrayList<>();
pan.stream() /* other functions */.forEach(bar::add);

or

List<E> bar = pan.stream() /* other functions */.collect(Collectors.toList());
Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
Afonso Matos
  • 2,406
  • 1
  • 20
  • 30
  • 11
    Why are you asking us instead of profiling it? – Joseph Sible-Reinstate Monica Oct 28 '18 at 20:22
  • Do your *other functions* contain filtering? If not, you can use `Collectors.toCollection` instead and explicitly specify an initial size (to avoid resizing the backing array). – Jacob G. Oct 28 '18 at 20:35
  • 4
    Clarity almost always trumps micro optimizations. The later is more functional, and likely to be the best option. – Peter Lawrey Oct 28 '18 at 20:35
  • 2
    That's the beauty of Java 8's *declarative* programming paradigm: you only need to tell Java *what* needs to be done, here to collect data into a collection, and then let Java worry about *how* to do it. Your second example is a much cleaner example of exactly this. – Hovercraft Full Of Eels Oct 28 '18 at 21:23

1 Answers1

3

I've tested these two scenarios for a list of size 1 mln. Overall there is almost no difference for a sequential stream but there is a difference for a parallel stream:

Benchmark                    Mode  Cnt  Score    Error   Units
Performance.collect          avgt  200  0.022 ±  0.001   s/op
Performance.forEach          avgt  200  0.021 ±  0.001   s/op
Performance.collectParallel  avgt  200  0.124 ±  0.004   s/op
Performance.forEachParallel  avgt  200  0.131 ±  0.001   s/op

In my opinion you shouldn't make a list using forEach because it breaks function purity rule and also collect is more efficient while using with a parallel stream.

@Benchmark @BenchmarkMode(Mode.AverageTime)
public void collect(Blackhole blackhole) {
    Stream<Double> stream = Stream.iterate(0.0, e -> Math.random());
    List<Double> list = stream.limit(1000000).collect(Collectors.toList());
    blackhole.consume(list);
}

@Benchmark @BenchmarkMode(Mode.AverageTime)
public void forEach(Blackhole blackhole) {
    Stream<Double> stream1 = Stream.iterate(0.0, e -> Math.random());
    List<Double> list = new ArrayList<>();
    stream1.limit(1000000).forEach(e -> list.add(e));
    blackhole.consume(list);
}

@Benchmark @BenchmarkMode(Mode.AverageTime)
public void collectParallel(Blackhole blackhole) {
    Stream<Double> stream = Stream.iterate(0.0, e -> Math.random());
    List<Double> list = stream.parallel().limit(1000000).collect(Collectors.toList());
    blackhole.consume(list);
}

@Benchmark @BenchmarkMode(Mode.AverageTime)
public void forEachParallel(Blackhole blackhole) {
    Stream<Double> stream1 = Stream.iterate(0.0, e -> Math.random());
    List<Double> list = Collections.synchronizedList(new ArrayList<>());
    stream1.parallel().limit(1000000).forEach(e -> list.add(e));
    blackhole.consume(list);
}
Michael Dz
  • 3,655
  • 8
  • 40
  • 74
  • 2
    This is not an accurate benchmark. I recommend using an existing framework such as JMH. – Jacob G. Oct 28 '18 at 21:15
  • ArrayIndexOutOfBoundsException? How? – Joe C Oct 28 '18 at 21:32
  • 1
    Also, did you check the accuracy of this solution? I have found `null`s get into `ArrayList`s in the past using the `forEach(list::add)` method with parallel streams. – Joe C Oct 28 '18 at 21:33
  • @JacobG Yes indeed it isn't the most accurate benchmark but it shouldn't matter because the difference is big, JMH should be more popular with wider adaptation of Java 9 which currently I don't have. `ArrayIndexOutOfBoundsException` occurs because another thread tries to add an element when other thread is resizing the underlying array. I should have used synchronized list instead, will update it soon. You probably had nulls because of threads interference on an unsynchronized list. – Michael Dz Oct 28 '18 at 21:43
  • The difference is **not** big whatsoever. Run the code in a different order and you might be surprised. – Jacob G. Oct 28 '18 at 21:44
  • 5
    What a nice test! Compares 10 million stream to 1 million stream in a [single method](https://stackoverflow.com/questions/24882946/java-loop-gets-slower-after-some-runs-jits-fault/24889503#24889503) within single JVM measuring [OSR stub](https://stackoverflow.com/questions/9105505/differences-between-just-in-time-compilation-and-on-stack-replacement) that generates random numbers and performs lots of GC. – apangin Oct 29 '18 at 07:04
  • 1
    I made a terrible mistake with the size of streams, updated the answer. – Michael Dz Oct 29 '18 at 10:24
  • Keep in mind that `Stream.iterate` does not play well with parallel in general, as each element depends on the previous. You may repeat all four test using `ThreadLocalRandom.current().doubles(1000000).boxed()` as stream (don’t apply an additional `limit`). Further, you could add `List list = Arrays.asList(stream[.parallel()].toArray(Double[]::new));` as an alternative `List` creation method… – Holger Oct 30 '18 at 08:46