The main reason to have several different private implementations of List
is to save space.
Consider an implementation that stores its elements in an array. (This is essentially what ListN
does.) In Hotspot (64-bit with compressed object pointers, each 4 bytes) each object requires a 12-byte header. The ListN
object has a single field containing the array, for a total of 16 bytes. An array is a separate object, so that has another 12-byte header plus a 4-byte length. That's another 16 bytes, not counting any actual elements stored. If we're storing two elements, they take 8 bytes. That brings the total to 40 bytes for storing a two-element list. That's quite a bit of overhead!
If we were to store the elements of a small list in fields instead of an array, that object would have a header (12 bytes) plus two fields (8 bytes) for a total of 20 bytes -- half the size. For small lists, there's a considerable savings with storing elements in fields of the List
object itself instead of in an array, which is a separate object. This is what the old List2
implementation did. It's recently been superseded by the List12
implementation, which can store lists of one or two elements in fields.
Now, in the API there are 12 overloaded List.of()
methods: zero to ten fixed args plus varargs. Shouldn't there be corresponding List0
through List10
and ListN
implementations?
There could be, but there doesn't necessarily have to be. An early prototype of these implementations had the optimized small list implementations tied to the APIs. So the zero, one, and two fixed arg of()
methods created instances of List0
, List1
, and List2
, and the varargs List.of()
method created an instance of ListN
. This was fairly straightforward, but it was quite restrictive. We wanted to be able to add, remove, or rearrange implementations at will. It's considerably more difficult to change APIs, since we have to remain compatible. Thus, we decided to decouple things so that the number of arguments in the APIs was largely independent of the implementation instantiated underneath.
In JDK 9 we ended up with the 12 overloads in the API, but only four implementations: field-based implementations holding 0, 1, and 2 elements, and an array-based implementation holding an arbitrary number. Why not add more field-based implementations? Diminishing returns and code bloat. Most lists have few elements, and there's an exponential dropoff in the occurrences of lists as the number of elements gets larger. The space savings get relatively smaller compared to an array-based implementation. Then there's the matter of maintaining all those extra implementations. Either they'd have to be entered directly in the source code (bulky) or we'd switch over to a code generation scheme (complex). Neither seemed justified.
Our startup performance guru Claes Redestad did some measurements and found that there was a speedup in having fewer list implementations. The reason is megamorphic dispatch. Briefly, if the JVM is compiling the code for a virtual call site and it can determine that only one or two different implementations are called, it can optimize this well. But if there are many different implementations that can be called, it has to go through a slower path. (See this article for Black Magic details.)
For the list implementations, it turns out that we can get by with fewer implementations without losing much space. The List1
and List2
implementations can be combined into a two-field List12
implementation, with the second field being null if there's only one element. We only need one zero-length list, since it's immutable! For a zero-length list, we can get rid of List0
just use a ListN
with a zero-length array. It's bigger than an old List0
instance, but we don't care, since there's only one of them.
These changes just went into the JDK 11 mainline. Since the API is completely decoupled from the implementations, there is no compatibility issue.
There are additional possibilities for future enhancements. One potential optimization is to fuse an array onto the end of an object, so the object has a fixed part and a variable-length part. This will avoid the need for an array object's header, and it will probably improve locality of reference. Another potential optimization is with value types. With value types, it might be possible to avoid heap allocation entirely, at least for small lists. Of course, this is all highly speculative. But if new features come along in the JVM, we can take advantage of them in the implementations, since they're are entirely hidden behind the API.