3

I need to convert a list of ids to array of ids. I can do it in many ways but not sure which one should be used.

Say,

1. ids.stream().toArray(Id[]::new)
2. ids.toArray(new Id[ids.length])

Which one is more efficient and why?

Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
Nishant Lakhara
  • 2,295
  • 4
  • 23
  • 46
  • 2
    Unless you actually want to use the stream API, I can't think of a good reason to use the first approach. I can't imagine the performance difference would justify the decreased readability of the code. – Chris Neve Mar 29 '19 at 09:05
  • 2
    Do you _really_ have performance issues? How many lists are we talking about? Can't that conversion be avoided in the first place? – Thomas Mar 29 '19 at 09:07
  • 4
    First : What do you mean by efficient ? less memory use, less CPU use, fastest ? Second : Is your array big enough to really make a difference ? If no, use the more readable, not the more efficient. – vincrichaud Mar 29 '19 at 09:10
  • Efficient means execution speed is faster. The array may contain 6 million entries – Nishant Lakhara Mar 29 '19 at 09:20
  • @vincrichaud not everything that is obvious turns to be faster, for example : `ids.toArray(new Id[ids.length])` will be slower then `ids.toArray(new Id[0])` – Eugene Mar 29 '19 at 09:23
  • 1
    @Eugene that's not what I said, as I agree with you, the more readable is not he fastest. What I mean is you should prioritize what you are coding. If we talk about an array of 20 entries, such optimization should not be considered, and we should use the more readable. If (as OP) you use millions of data, then optimization become interesting and may take the lead on the readability (that where commenting code become useful) – vincrichaud Mar 29 '19 at 09:31
  • @vincrichaud right, the OP said 6 million though – Eugene Mar 29 '19 at 09:32
  • `ids.toArray(new Id[ids.length])` won’t compile. You meant `ids.toArray(new Id[ids.size()])`, but as @Eugene already mentioned, that won’t be better than `ids.toArray(new Id[0])`… – Holger Mar 29 '19 at 10:05

1 Answers1

6

java-11 introduced Collection::toArray that has this implementation:

default <T> T[] toArray(IntFunction<T[]> generator) {
    return toArray(generator.apply(0));
}

To make it simpler in your case, it is actually doing : ids.toArray(new Id[0]); that is - it is not specifying the total expected size.

This is faster than specifying the size and it's non-intuitive; but has to do with the fact that if the JVM can prove that the array that you are allocating is going to be overridden by some copying that is immediately followed, it does not have to do the initial zeroing of the array and that proves to be faster then specifying the initial size (where the zeroing has to happen).

The stream approach will have (or try to guess an estimate) an initial size that the stream internals will compute, because:

 ids.stream().toArray(Id[]::new)

is actually:

 ids.stream().toArray(size -> Id[size]);

and that size is either known or estimated, based on the internal characteristics that a Spliterator has. If the stream reports SIZED characteristic (like in your simple case), then it's easy, size is always known. On the other hand if this SIZED is not present, stream internals will only have an estimate of how many elements will be present and in such a case, an hidden new collection will be used to capture elements, called SpinedBuffer.

You can read more here, but the approach ids.toArray(new Id[0]) will be the fastest.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • 4
    Not every `Collection` produces a `SIZED` stream, i.e. concurrent collections won’t. But for concurrent collections, `ids.toArray(new Id[ids.size()])` would be even broken unless the application can preclude concurrent modifications during the operation. So it boils down to `ids.toArray(new Id[0])` being the simplest, least error-prone, *and* most efficient solution. – Holger Mar 29 '19 at 10:08