I have sort of a hard time understanding an implementation detail from java-9 ImmutableCollections.SetN
; specifically why is there a need to increase the inner array twice.
Suppose you do this:
Set.of(1,2,3,4) // 4 elements, but internal array is 8
More exactly I perfectly understand why this is done (a double expansion) in case of a HashMap
- where you never (almost) want the load_factor
to be one. A value of !=1
improves search time as entries are better dispersed to buckets for example.
But in case of an immutable Set - I can't really tell. Especially since the way an index of the internal array is chosen.
Let me provide some details. First how the index is searched:
int idx = Math.floorMod(pe.hashCode() ^ SALT, elements.length);
pe
is the actual value we put in the set. SALT
is just 32 bits generated at start-up, once per JVM
(this is the actual randomization if you want). elements.length
for our example is 8
(4 elements, but 8 here - double the size).
This expression is like a negative-safe modulo operation. Notice that the same logical thing is done in HashMap
for example ((n - 1) & hash
) when the bucket is chosen.
So if elements.length is 8
for our case, then this expression will return any positive value that is less than 8 (0, 1, 2, 3, 4, 5, 6, 7)
.
Now the rest of the method:
while (true) {
E ee = elements[idx];
if (ee == null) {
return -idx - 1;
} else if (pe.equals(ee)) {
return idx;
} else if (++idx == elements.length) {
idx = 0;
}
}
Let's break it down:
if (ee == null) {
return -idx - 1;
This is good, it means that the current slot in the array is empty - we can put our value there.
} else if (pe.equals(ee)) {
return idx;
This is bad - slot is occupied and the already in place entry is equal to the one we want to put. Set
s can't have duplicate elements - so an Exception is later thrown.
else if (++idx == elements.length) {
idx = 0;
}
This means that this slot is occupied (hash collision), but elements are not equal. In a HashMap
this entry would be put to the same bucket as a LinkedNode
or TreeNode
- but not the case here.
So index
is incremented and the next position is tried (with the small caveat that it moves in a circular way when it reaches the last position).
And here is the question: if nothing too fancy (unless I'm missing something) is being done when searching the index, why is there a need to have an array twice as big? Or why the function was not written like this:
int idx = Math.floorMod(pe.hashCode() ^ SALT, input.length);
// notice the diff elements.length (8) and not input.length (4)