Problem
I have a huge collection of strings that are duplicated among some objects. What is need is string interning. These objects are serialized and deserialized with protobuf-net
. I know it should handle .NET string intering, but my tests have shown that taking all those strings myself and creating a Dictionary<string, int>
(mapping between a value and its unique identifier), replacing original string values by ints, gives better results.
The problem, though, is in the mapping. It is only one-way searchable (I mean O(1)-searchable). But I would like to search by key or by value in O(1). Not just by key.
Approach
The set of strings is fixed. This sounds like an array. Search by value is O(1), blinding fast. Not even amortized as in the dictionary - just constant, by the index.
The problem with an array is searching by keys. This sounds like hashes. But hey, n
hashes aren't said to be evenly distributed among exactly n
cells of the n
-element array. Using modulo, this will likely lead to collisions. That's bad.
I could create, let's say, an n * 1.1
-length array, and try random hashing functions until I get no collisions but... that... just... feels... wrong.
Question
How can I solve the problem and achieve O(1) lookup time both by keys (strings) and values (integers)?
Two dictionaries is not an option ;)