- You don't care about ordering
- Your key is already random
- 10 million items
The short answer
A hash table will probably be the best for your case.
Speed
A hash table (std::unordered_map
) will be O( 1 ) if hashing is constant. In your case, O( 1 ) holds because you don't even need to hash—just using the lower 32 bits of the random UUID should be good enough. The cost of a lookup will be similar to one or two pointer indirections.
A binary tree (std::map
) will be O( log2 n ), so for 10 million items you'll have 24 comparisons and 24 potential cache misses. Even for n = 4,000 it'll use 12 comparisons, so it very quickly becomes significantly worse than a hash table.
A radix tree will be O( k ), so you'll have a maximum of k comparisons and k potential cache misses. At a very unlikely best, the radix tree will be as fast as a hash table. At worse (assuming k = a somewhat reasonable 16, for a 256-way tree) it'll perform better than a binary tree but far worse than a hash table.
So if speed is top priority, use a hash table.
Overhead
A typical hash table will have around 1–3 pointers of overhead per item if full. If not full, you'll probably be wasting 1 pointer of space per empty slot. You should be able to keep it nearly full while still being faster than a binary tree because you've got a very random key, but for maximum possible speed you'll of course want to give it plenty of headroom. For 10 million items on a 32-bit machine, expect 38–114MiB of overhead for a full table. For a half-full table, expect 76–153MiB.
A red-black tree, the most common std::map
implementation, will have 3 pointers + 1 bool per item. Some implementations exploit pointer alignment to merge the bool with one of the pointers. Depending on implementations and how full the hash table is, a red-black tree might have slightly lower overhead. Expect 114–153MiB.
A radix tree will have 1 pointer per item and 1 pointer per empty slot. Unfortunately I think such large random keys will cause you to have very many empty slots toward the edge of a tree, so it will probably use more memory than either of the above. Decreasing k can lower this overhead but will similarly lower performance.
If minimum overhead is important, use a hash table or binary tree. If it's a priority, use a full hash table.
Note that std::unordered_map
does not let you control when it will resize, so getting one full will be difficult. Boost Intrusive has a very nice unordered_map
implementation that will put you directly in control of that and many other things.