If there're often a lot of collisions at the same bucket, then using a set is more efficient than using a list, and indeed some Java hash table implementations have adopted sets for this reason. vectors can't be used for std::unordered_map
or std::unordered_set
implementations, as they need to reallocate to a different memory area when grown past their capacity, whilst the Standard requires that the elements in an unordered container are never moved by other operations on the container.
That said, the nature of hash tables is that - with a high quality hash function - the statistical distribution of number-of-elements colliding in particular buckets relates only to the load factor. If you can't trust the collisions not to get out of control, perhaps you shouldn't be using that hash function.
Some details: Standard-library unordered containers have a default max_load_factor()
(load_factor()
is the ratio of size()
to bucket_count()
) of 1.0, and with a strong pseudo-randomizing hash function they'll have 1/e ~= 36.8% of buckets empty, as many with one element, half that with 2 elements (~18.4%), a third of that with 3 elements (~6.13%), a quarter of that with 4 elements (~1.53%), a fifth of that with 5 elements (~0.3%), a sixth of that with 6 elements (~0.05%). As you can hopefully see, it's incredibly rare to have to search through many elements (even in the worst case scenario where the hash table is at its max load factor), so a list approach is usually adequate.