Would Stream.toList() perform better than Collectors.toList()

Question

JDK is introducing an API Stream.toList() with JDK-8180352. Here is a benchmarking code that I have attempted to compare its performance with the existing Collectors.toList:

@BenchmarkMode(Mode.All)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 20, time = 1, batchSize = 10000)
@Measurement(iterations = 20, time = 1, batchSize = 10000)
public class CollectorsVsStreamToList {

    @Benchmark
    public List<Integer> viaCollectors() {
        return IntStream.range(1, 1000).boxed().collect(Collectors.toList());
    }

    @Benchmark
    public List<Integer> viaStream() {
        return IntStream.range(1, 1000).boxed().toList();
    }
}

The result summary is as follows:

Benchmark                                                       Mode  Cnt   Score    Error  Units
CollectorsVsStreamToList.viaCollectors                         thrpt   20  17.321 ±  0.583  ops/s
CollectorsVsStreamToList.viaStream                             thrpt   20  23.879 ±  1.682  ops/s
CollectorsVsStreamToList.viaCollectors                          avgt   20   0.057 ±  0.002   s/op
CollectorsVsStreamToList.viaStream                              avgt   20   0.040 ±  0.001   s/op
CollectorsVsStreamToList.viaCollectors                        sample  380   0.054 ±  0.001   s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.00    sample        0.051            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.50    sample        0.054            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.90    sample        0.058            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.95    sample        0.058            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.99    sample        0.062            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.999   sample        0.068            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p0.9999  sample        0.068            s/op
CollectorsVsStreamToList.viaCollectors:viaCollectors·p1.00    sample        0.068            s/op
CollectorsVsStreamToList.viaStream                            sample  525   0.039 ±  0.001   s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.00            sample        0.037            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.50            sample        0.038            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.90            sample        0.040            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.95            sample        0.042            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.99            sample        0.050            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.999           sample        0.051            s/op
CollectorsVsStreamToList.viaStream:viaStream·p0.9999          sample        0.051            s/op
CollectorsVsStreamToList.viaStream:viaStream·p1.00            sample        0.051            s/op
CollectorsVsStreamToList.viaCollectors                            ss   20   0.060 ±  0.007   s/op
CollectorsVsStreamToList.viaStream                                ss   20   0.043 ±  0.006   s/op

Of course, the first question to the domain experts would be if the benchmarking procedure is correct or not? The test class was executed on MacOS. Do let me know for any further details required.

Follow-up, as far as I could infer from the readings the average time, throughput, and sampling time of the Stream.toList looks better than the Collectors.toList. Is that understanding correct?

Side note: Based on the [implementation](https://github.com/openjdk/jdk/commit/41dbc139#diff-761882141e64725ba36ab6cb28dc7ea65deefa32f7b9810355de42126efefc6bR627) it looks like what you're really testing is the difference between `toArray()` and `collect(Collectors.toList())`. But that's of course only one implementation. — Slaw, Jan 15 '21 at 18:59
It is quite possible that `Stream::toList` is more efficient in some cases -- but it really depends on the details. `Stream::toList` builds on `toArray`, and for sources that have the SIZED (and ideally SUBSIZED, for parallel streams) characteristics, `toArray` is optimized to reduce reallocation and copying compared to `collect`. — Brian Goetz, Jan 15 '21 at 20:30
If you're trying to measure the difference between the two collectors, though, you should minimize the work of the rest of the stream pipline. I would move the source generation, and the boxing, out of the benchmark method and into an `@State` variable initialized outside the benchmark method, with something like `Stream.of(data).toList()`. The boxing is surely distorting your data. I'd also include parallel runs. — Brian Goetz, Jan 15 '21 at 20:33
@BrianGoetz did you mean to initialiise the source data into a static attribute and then simply using the `stream` under the benchmark something like `static final Set data = IntStream.range(0, 1000).boxed().collect(Collectors.toSet()); @Benchmark public List viaCollectors() { return data.stream().collect(Collectors.toList()); } @Benchmark public List viaCollectorsParallel() { return data.stream().parallel().collect(Collectors.toList()); }` ? Not really sure of how to do that with `@State`. If so, I can update with those results as well. — Naman, Jan 16 '21 at 03:53
@Naman There's a few ways to do it, but here's one: http://tutorials.jenkov.com/java-performance/jmh.html The main thing is that you want to move as much setup as possible out of the benchmark methods. I'd probably go with `Stream.of(array).collect(...)` in the benchmark method, rather than a Set, since arrays will give you a better-behaving `Spliterator` than `Set`. — Brian Goetz, Jan 16 '21 at 14:05
@BrianGoetz Thank you for sharing. I have tried to rephrase my tests based on my understanding and evaluated the results to be similar for non-parallel streams as shared in the question. For the `.parallel()` streams, the `Collectors.toList()` seems to be slightly better than the `Stream.toList()` under the results I could generate. I have refrained to pull in the complete result output here, but let me know if [any of it](https://github.com/fortybits/a-bit-of-benchmarking/blob/bits/src/main/java/jmh/CollectorsVsStreamToList.java)(comments after code) makes sense to be a part of the question. — Naman, Jan 17 '21 at 17:34
@BrianGoetz is there any reason why the code and documentation use `Collections.unmodifiableList(new ArrayList<>(Arrays.asList(this.toArray())))` instead of `Collections.unmodifiableList(Arrays.asList(this.toArray()))`? This additional copying step seems to serve no purpose. — Holger, Jan 18 '21 at 12:23
@Holger not really much, it's just that looking at the JDK code, I wanted to confirm the same behavior as your previous comment. But for that, I had performed the benchmarks on two completely different APIs. — Naman, Jan 18 '21 at 15:33
@Naman I don’t think that the actual Stream implementation provided by the JDK will end up at the default method, but rather have it overridden with a smarter implementation. Still, it’s annoying to have such an apparently unnecessary copying step even in documentation of the method. — Holger, Jan 18 '21 at 16:10
@Holger You're looking at the wrong code. You're seeing the _default_ implementation, which only comes into play with third-party implementations of streams. The actual implementation is in `ReferencePipeline`; that's the one that is actually being used. That said, it does seem the default code is doing an extra step. — Brian Goetz, Jan 18 '21 at 19:28
@Holger The extra step in the default implementation is there to force a defensive copy if `this.toArray` were to violate its spec and keep a reference to the returned array. Without the defensive copy, it would be possible to modify the list returned from the default `toList` implementation. — Stuart Marks, Jan 18 '21 at 21:35
@StuartMarks `toArray` is part of the Stream API, so if an implementation violates the spec and returns a shared array, callers of the `toArray` method could already break. Why should callers of the `toList()` method get more guarantees than callers of `toArray()`? It’s very simple: the methods do what the spec says, if the implementation respects the spec. It’s not the task of a default implementation to fix a potentially broken implementation. Defensive copies are good for essential classes like `String`, but not for a wrapper that never guaranteed to be a truly immutable collection anyway. — Holger, Jan 19 '21 at 07:57
@Holger We disagree, but I'm not going to argue this out here. — Stuart Marks, Jan 19 '21 at 16:39
@StuartMarks ok, this is not the place. Just one last comment, there are lots of places, where the result of JDK code depends on the correctness of an unknown implementation of an interface and having such a distrust/expensive defensive coding at this place looks arbitrary. And well, *in the documentation*, it definitely is unnecessary. — Holger, Jan 20 '21 at 07:28

Brian Goetz · Accepted Answer · 2021-01-31T17:00:40.043

Stream::toList is built atop toArray, not collect. There are a number of optimizations in toArray that make it potentially faster than collecting, though this depends heavily on the details. If the stream pipeline (from source through final intermediate operation) is SIZED, the target array can be presized (rather than potentially reallocating as the toList collector must do.) If the pipeline is further SUBSIZED, then parallel executions can not only presize the result array, but can compute exact per-shard offsets so each sub-task can drop its results in exactly the right place, eliminating the need for copying intermediate results into the final result.

So, depending on the details, toList may well be considerably faster than collect.

Even for unsized streams, the internally used spined buffer may be more efficient than filling an `ArrayList` like `Collectors.toList()` does. — Holger, Feb 01 '21 at 13:57

Would Stream.toList() perform better than Collectors.toList()

1 Answers1

Linked