6

I want to seek through massive data, grouped with multiple keys, in the fastest way possible. I have a file with this information but I want to load it in-memory. Memory capacitance is not a problem.

key1 | key2 | key3 | key4 | value1 | value2
-----|------|------|------|--------|--------
1    | 1    | 1    | 1    | str    | 20
1    | 1    | 1    | 2    | str    | 20
1    | 1    | 1    | 3    | str    | 20
1    | 1    | 2    | 1    | str    | 20
2    | 1    | 1    | 1    | str    | 20

I have looked at some collections but I'm still uncertain : enter image description here http://blog.bodurov.com/Performance-SortedList-SortedDictionary-Dictionary-Hashtable

Maybe a multikey dictionary will be better because it avoid a lots of redundancy in the keys.

public class MultiKeyDictionary<T1, T2, T3> : Dictionary<T1, Dictionary<T2, T3>>


key1 | key2 | key3 | key4 | value1 | value2
-----|------|------|------|--------|--------
1    | 1    | 1    | 1    | str    | 20
     |      |      | 2    | str    | 20
     |      |      | 3    | str    | 20
     |      | 2    | 1    | str    | 20
2    | 1    | 1    | 1    | str    | 20

I will not seek every keys but maybe 50% of them. I'm open to even insane suggestions.

Chris Ballance
  • 33,810
  • 26
  • 104
  • 151
Adam Paquette
  • 1,243
  • 1
  • 14
  • 28
  • Can you materialize multiple copies of the data independently optimized for each search you will do? Indexed for Key1, indexed for key2, etc... – Chris Ballance Apr 08 '14 at 20:27
  • 2
    Can you be more specific about the concrete queries you need to execute against this data? – usr Apr 08 '14 at 20:28
  • I can split the data by key from left to right but to access key 2 I need first to access key 1. So every right key is in a collection from left key. key4 is in key3 and key3 is in key2... – Adam Paquette Apr 08 '14 at 20:31
  • Let say my keys are Country -> State -> City -> House. They all from a single key. – Adam Paquette Apr 08 '14 at 20:39
  • 1
    There is a related post with quite a few hints: http://stackoverflow.com/questions/4681526/system-collections-generic-dictionary-ultimate-performance What about the size limit of 2GB? Do you need concurrent access? Are your keys unique? It might be worthwhile to parallelize your queries, to partition your data, etc. Lots of possible things to dig into. – Axel Kemper Apr 08 '14 at 21:02
  • Assuming that a) the keys form a natural hierachy, as their names suggest, b) memory really is no problem and c) you'd be willing to invest in longer loading time for best performance, I would not generate one datastructure but four with 1,2,3 and 4 keys. So I'd go for 4 SortedDictionaries. – TaW Apr 08 '14 at 21:40
  • From the benchmark I posted, SortedDictionnary is not good. Also from : http://stackoverflow.com/questions/2376459/unexpected-poor-performance-of-sorteddictionary-compared-with-dictionary – Adam Paquette Apr 08 '14 at 21:48

1 Answers1

1

You could simply use a Tuple of your keys for the dictionary key and for the value.

var bank = new Dictionary<Tuple<int, int, int, int, int>, Tuple<string, int>>();

bank.Add(Tuple.Create(k1, k2, k3, k4), Tuple.Create("str", 20));
BenVlodgi
  • 2,135
  • 1
  • 13
  • 26
  • no because if he want's to find matches for only 1 key he cant, you're just combining all the keys into a single key, restricting him to a single key for each record which is not what the question is about. – Eluvatar Apr 08 '14 at 20:29
  • @Eluvatar In that case he could just use Dictionary> bank, and have each key point to the same Tuple value – BenVlodgi Apr 08 '14 at 20:30
  • 1
    Key1-4 make a single key. I want to avoid redundancy in keys with something like Dictionary>... but performance is crucial so... – Adam Paquette Apr 08 '14 at 20:36