I have a relatively large set of data that lends itself very naturally to the c#'s dictionary object. Currently, I have 102400 key-value pairs that are being generated semi-dynamically when my program starts up. My problem is that I have to run a great number of lookup operations as fast as possible.
According to This Page the speed of the lookups is directly influenced by the number of key-value pairs in the dictionary. My data is a bit odd in that a great number of different keys lead to the same value. In fact, I have only 4900 distinct values... This means that I have an average of 20 key-value pairs for each distinct value.
My first instinct was to swap the keys for the values (as I only care about the distinct values in the data) then have the old keys in a list or array as the new values. This reduced my dictionary size to 4900 from 102400 key-value pairs, but I can't see any way to efficiently search all of the lists for a specific value to get the key by.
I know that my description probably got a bit dificult to follow as I switched the keys and values, so I've included a mock-up of my data to show you what I mean:
old method:
Key Value
--- -----
1 1
2 2
3 3
4 1
5 3
6 2
7 2
8 1
9 3
10 2
11 3
12 1
New structure:
Key Value
--- -----
1 {1,4,8,12}
2 {2,6,7,10}
3 {3,9,5,11}
In my program, I'm going to be given '11' and I'll need to return '3'. The first structure is a simple lookup, but is a huge list which seems to be slowing things down... the second adds so much logical overhead to track down which value list I'm looking for that I've only seen a reduction in speed trying to implement it.
Am I barking up the wrong tree here? Should I just accept the speed of the larger list, or is there some other way that I can store my data to increase the lookup speed?