Why does the Java Stream API omit the 'toArray(T[] a)' overload for writing to an existing array?

Question

While the Collection API provides three overloads of toArray

Object[] toArray()
T[] toArray(IntFunction<T[]> generator)
T[] toArray(T[] a)

the Stream API only provides the first two of those. That is, it does not provide a way to use an existing array as the return value of the stream-to-array conversion.

--> What is the reason for the omission of toArray(T[] a) in Stream ?

I imagine the main reason is the functional spirit of streams, which would make a side effect like writing to an existing array undesired. Is this correct ? Are there other reasons that would make the other two versions preferrable ? Maybe there are even reasons that make them also preferrable on a Collection ?

`toArray(T[])` was always a bad idea; there was just no better alternative until lambdas were introduced. — shmosel, Jan 27 '22 at 20:25
I also hesitated to close it as opinion-based, but I think there are very valid reasons for not doing it, so I chose to put them in an answer instead. — Didier L, Jan 27 '22 at 22:23

Didier L · Accepted Answer · 2022-01-28T18:19:12.923

If we consider the specification of Collection.toArray(T[]):

[…] If the collection fits in the specified array, it is returned therein. Otherwise, a new array is allocated with the runtime type of the specified array and the size of this collection.

However there are some issues that would prevent doing the same with streams:

an implementation cannot know in advance whether a stream of unknown size will fit
with parallel streams, splitting a stream (Spliterator) of known size (i.e. SIZED) could lead to 2 spliterators of unknown size if the original spliterator is not also SUBSIZED, so you wouldn’t know where to put the data after splitting

in both cases, an implementation would still have to create new arrays and finish by copying the data, defeating the purpose of the above requirement.

As in a lot of scenarios you are working with streams of unknown size (a simple filter() or flatMap() would remove that property), you will often fall into the above limitations.

Moreover, even for the known size case, people were often allocating a new array of the right size at the time of calling Collection.toArray(T[]). This was actually counter-productive in more recent versions of the JVM, so it would be a bad thing to bring the same issue in the Stream API.

In the end, if we remove the requirement to fill the provided array, there does not seem to be much benefit left over the toArray(IntFunction<T[]>) version.

@Holger the question I linked did not consider the array creation through `IntFunction` and, as I had understood it, the overhead of reflection was completely compensated by avoiding the zeroing. I hadn’t dug into that linked article which clearly shows that reflective array creation is the same as _lang_ (although not evaluating the full `Array.newInstance(a.getClass().getComponentType(), size)` for that test, but ok). Anyway, the difference can only be negligible, and I shouldn’t have written that without proper benchmark. It was just a last minute, unthinking addition I put in my answer . — Didier L, Jan 28 '22 at 18:46
Never mind. We fall into that trap too often. It’s just important that we fix it when we notice. — Holger, Jan 31 '22 at 09:40

score 0 · Answer 2 · edited Jan 27 '22 at 20:41

0

The common pattern (at least according to my observations) for the use of Collection.toArray(T[]) was this one:

var array = list.toArray( new T[0] );

meaning that an empty array with size 0 was provided.

Calling toArray() with an existing array allowed to return an array of the proper type, in opposite to an array of Object. There was no other way to give the type of the desired return type into the method than via a 'sample'. You can think about like toArray( Class<T> elementType ) instead, but that does not work proper if T is a parameterised type as well.

When Lambda were introduced, the variant with the IntFunction<T[]> replaced the variant with the existing array in its functionality, therefore it was omitted for Stream. I expect that Collection.toArray(T[]) will be deprecated soon, and will be removed with one of the upcoming LTS versions – not with the next, or that one after the next, of course!

edited Jan 27 '22 at 20:41

Pshemo

122,468
25
185
269

answered Jan 27 '22 at 20:25

tquadrat

3,033
1
16
29

I remember reading that they didn't want to add the Collection overload because it could make existing code ambiguous (e.g. `toArray(null)`). I wonder why they decided to add it in Java 11. – shmosel Jan 27 '22 at 21:12
@shmosel – Sorry, I lost you! What did they add in Java 11? The `Stream.toArray()` method with the generator argument? Or what? – tquadrat Jan 27 '22 at 22:11
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Collection.html#toArray(java.util.function.IntFunction) – shmosel Jan 27 '22 at 23:04
1

Whether `toArray( Class elementType )` would work well when `T` is a parameterized type, doesn’t matter, as `new T[0]` also doesn’t work well in that case. But `toArray(Class)` would allow passing in `void.class` or `int.class`, whereas `toArray(T[])` rules out these cases at compile time already. But `toArray(Class arrayType )` would have worked and [`Arrays.copyOf(…)`](https://docs.oracle.com/javase/8/docs/api/java/util/Arrays.html#copyOf-U:A-int-java.lang.Class-) does use that pattern. – Holger Jan 28 '22 at 17:19
2

…So the reason to offer `toArray(T[])` was not that passing a `Class` wouldn’t work equally well for the zero length array case, but [as the elaborated contract indicates](https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html#toArray-T:A-) passing in a non-zero length array was a considered use case by the time the API was created. I suppose, we all agree *from today’s perspective* that this is rather a rare corner case. – Holger Jan 28 '22 at 17:23

Why does the Java Stream API omit the 'toArray(T[] a)' overload for writing to an existing array?

2 Answers2