2

Consider the following code:

List<String> myList = Arrays.asList(1, 2, 3);
String[] myArray1 = myList.toArray(new String[myList.size()]);
String[] myArray2 = myList.stream().toArray(String[]::new);
assert Arrays.equals(myArray1, myArray2);

It seems to me that using a stream is much simpler.

Therefore, I tested the speed of each.

List<String> myList = Arrays.asList("1", "2", "3");
double start;

start = System.currentTimeMillis();
for (int i = 0; i < 10_000_000; i++) {
    String[] myArray1 = myList.toArray(new String[myList.size()]);
    assert myArray1.length == 3;
}
System.out.println(System.currentTimeMillis() - start);

start = System.currentTimeMillis();
for (int i = 0; i < 10_000_000; i++) {
    String[] myArray2 = myList.stream().toArray(String[]::new);
    assert myArray2.length == 3;
}
System.out.println(System.currentTimeMillis() - start);

The result is that using the stream is about four times slower. On my machine, 816ms (stream) vs 187ms (no stream). I also tried switching the timing statements around (myArray2 prior to myArray1), which didn't effect the results much. Why is this so much slower? Is creating a Stream so computationally intensive?

I followed @Holger's advice and studied up a bit (surely not enough) on JVM testing, reading this post, this article, this article, and using JMH.


Results (via JMH):

private static final List<String> myList = IntStream.range(1, 1000).mapToObj(String::valueOf).collect(Collectors.toList());

@Benchmark
public void testMethod() {
    String[] myArray = myArrayList.stream().toArray(String[]::new);
}

StreamToArrayArrayListBenchmark.testMethod avgt 5 2846.346 ± 32.500 ns/op

private static final List<String> myList = IntStream.range(1, 1000).mapToObj(String::valueOf).collect(Collectors.toList());

@Benchmark
public void testMethod() {
    String[] myArray = myArrayList.toArray(new String[0]);
}

ToArrayEmptyArrayListBenchmark.testMethod avgt 5 1417.474 ± 20.725 ns/op

private static final List<String> myList = IntStream.range(1, 1000).mapToObj(String::valueOf).collect(Collectors.toList());

@Benchmark
public void testMethod() {
    String[] myArray = myArrayList.toArray(new String[myList.size()]);
}

ToArraySizedArrayListBenchmark.testMethod avgt 5 1853.622 ± 178.351 ns/op


private static final List<String> myList = new LinkedList<>(IntStream.range(1, 1000).mapToObj(String::valueOf).collect(Collectors.toList()));

@Benchmark
public void testMethod() {
    String[] myArray = myArrayList.stream().toArray(String[]::new);
}

StreamToArrayLinkedListBenchmark.testMethod avgt 5 4152.003 ± 59.281 ns/op

private static final List<String> myList = new LinkedList<>(IntStream.range(1, 1000).mapToObj(String::valueOf).collect(Collectors.toList()));

@Benchmark
public void testMethod() {
    String[] myArray = myArrayList.toArray(new String[0]);
}

ToArrayEmptyLinkedListBenchmark.testMethod avgt 5 4089.550 ± 29.880 ns/op

private static final List<String> myList = new LinkedList<>(IntStream.range(1, 1000).mapToObj(String::valueOf).collect(Collectors.toList()));

@Benchmark
public void testMethod() {
    String[] myArray = myArrayList.toArray(new String[myList.size()]);
}

ToArraySizedArrayListBenchmark.testMethod avgt 5 4115.557 ± 93.964 ns/op


To summarize:

              | ArrayList | LinkedList
stream        | 2846      | 4152
toArray sized | 1853      | 4115
toArray empty | 1417      | 4089

Using JMH (possibly naively), I'm still seeing that ArrayList::toArray is about twice as fast as Stream::toArray. However, this does seem to be because of the ability of the ArrayList to just do an array copy, as @Andreas pointed out because when the source is a LinkedList the results are about equal.

It's definitely good to know about myList.toArray(new String[0]).

Community
  • 1
  • 1
dantiston
  • 5,161
  • 2
  • 26
  • 30
  • What if you use `Arrays.stream()`? – Josh Lee Feb 02 '17 at 21:30
  • @JoshLee in my real code, I have a List that I'm trying to convert into an array. I'm not how I would use Arrays.stream() with it? – dantiston Feb 02 '17 at 23:04
  • @Holger are you saying by marking this as a duplicate that my conclusion is wrong? I'm not really asking about how to do benchmarking -- I assume my benchmarks aren't 100% accurate. I'm really asking about the overhead of a `Stream` -- that isn't covered in the other question. – dantiston Feb 03 '17 at 18:04
  • We are not talking about “100% accurate”, you are violating *every* rule mentioned there. Of course, conclusions drawn from such a fundamentally broken benchmark are wrong. For real life use cases, the practical difference between these approaches is close to zero. – Holger Feb 03 '17 at 18:09
  • @Holger Okay -- that's a reasonable response to my question. I still don't see how this is a duplicate whatsoever. If your answer is "your benchmarking is wrong -- these are equivalent", why not say so in an answer? – dantiston Feb 03 '17 at 18:12
  • Because the other Q&A provides you with plenty of examples what is wrong with your benchmark. Repeating them here wouldn’t be useful. But these are the answer to your question. My statement that both approaches are equivalent is not an answer, that’s just a comment. You would have to make a correct benchmark to prove it. That linked Q&A might help… – Holger Feb 03 '17 at 18:23
  • @Holger I followed your advice regarding writing the tests. Please see my revised question. – dantiston Feb 03 '17 at 20:10
  • Please, continue reading about the topic. You are on a good way, but it still immediately jumps into the reader’s eye, that your code doesn’t use the result and takes no measures to avoid being entirely eliminated by the optimizer. In a JMH test, you can avoid it by returning the result or calling the dedicated backhole/consumer method with the values as an argument. The reason, why `toArray` with an empty array is faster than with a pre-sized array, is explained [here](https://shipilev.net/blog/2016/arrays-wisdom-ancients/) – Holger Feb 06 '17 at 08:24

2 Answers2

9

Arrays.asList() creates a fixed-size List that is directly backed by the varargs array parameter. Javadoc even says so:

Returns a fixed-size list backed by the specified array.

Its implementation of toArray() is a simple System.arraycopy(). Very fast.

On the other hand, when you do myList.stream().toArray(String[]::new), the size is not known, so the Stream.toArray() method has to consume the stream, collect all the values, then create the array and copy the values into the array. That is a lot slower, and requires a lot more memory.

In short, it's a waste of resources.

If you want simpler, just don't give the array size. It is still way faster and less memory intensive than using Streams:

String[] myArray1 = myList.toArray(new String[0]);
Andreas
  • 154,647
  • 11
  • 152
  • 247
  • I didn't know that you can use toArray with an empty array. Thanks for the tip! It seems almost equal speed wise to giving it a pre-sized array. – dantiston Feb 02 '17 at 23:09
  • I just tried this with using a LinkedList instead of Arrays.asList() and the behavior is the same, and the times for LinkedList vs ArrayList are almost identical. It seems like all of the overhead is in the Stream. – dantiston Feb 02 '17 at 23:11
  • No matter how you slice it (so to speak), array copy is only a single call to the JVM, whereas any library solution is going to have at least one explicit loop. – Josh Lee Feb 03 '17 at 00:32
  • @JoshLee I understand that -- however, the evidence doesn't support the claim that the speed is due to it being an ArrayList. LinkedList::toArray (inherited from AbstractList::toArray) is iterating over the list to populate the array. It's not an array copy. Yet, it is nowhere near as bad as the `Stream` version. – dantiston Feb 03 '17 at 01:38
  • 3
    The Stream created by a `List`, having no `filter` etc. *does* know its size and it utilizes it in the `toArray` implementation. Still, in an unoptimized execution it’s likely to be slower than any hardcoded operation, simply because this flawed benchmark records the time needed to load, verify and initialize the `Stream` class and all of its internally used classes, as well as the `LambdaMetaFactory` and its dependencies. – Holger Feb 03 '17 at 13:32
0

Under the hood, streams are much more complicated than plain arrays. Compilers will get better, but currently, sequential for loops should be faster than stream operations.

This article has some background about stream pipelines, which are used to implement streams. It can help to understand the complexity behind it.

The advantage of streams is that the code can be clearer and it is easier to parallelize it.

Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239