What is the fastest way to store information with two integer keys?

Question

I have got tons of objects, which have two Int32 Keys to identify them in a collection. The question is the choice of collection. The only criteria is pure performance for accessing/"reading" an object(no need to be writable).

Stuff I analyzed:

I didn't try hashtables cause of this. I had got the idea to try around with following types:

Dictionary<Tuple<int,int>,object>
Dictionary<long,object> // putting the bytes of two ints together by first*(int.MaxValue+1L)+second
Dictionary<int, List<Tuple<int,object>>
Dictionary<int, Dictionary<int,object>>


List<Tuple<int,int,object>>
List<Tuple<long,object>>
List<Tuple<int,Tuple<int,object>>

I filled them with 1000 values for the first and 100 values for the second integer. The "object" for testing was firstInt.ToString()+secondInt.ToString()

I measured using following pattern:

        System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
        bool b=true;
        st.Start();
        for(int i=0;i<100;i++)
        {
            for (int ii = 0; ii< 1000; ii++)
            {   //for making sure code optimization doesn't jump over it
                b=b&*accessor*.Contains('a');
            }
        }
        st.Stop();
        Console.WriteLine("Result: "+st.ElapsedMilliseconds);

I used following accessors:

     Dictionary<Tuple<int,int>,object> dir [...]
     Accessor:  dir[a][b]
     Result: 211ms

     Dictionary<long,object> dir [...]
     Accessor: dir[merge(a,b)]
     Result: 70ms

     //method for sticking two ints together
     public static long merge(int a,int b)
     { return (int.MaxValue+1L)*a+b;}

     Dictionary<int, List<Tuple<int,object>> dir[...]
     Accessor:   dir[a].Find(x=>x.Item1==b).Item2
     Result:  15ms

     Dictionary<int, Dictionary<int,object>> dir[...]
     Accessor:   dir[a][b]
     Result:  22ms

     List<Tuple<int,int,object>> dir[...]
     Accessor: dir.Find(x=>x.Item1==a&&x.Item2==b).Item3
     Result: over60s

     List<Tuple<long,object>> dir[...]
     Accessor: dir.Find(x=>x.Item1==merge(a,b)).Item2
     Result: over60s

     List<Tuple<int,Tuple<int,object>> dir[...]
     Accessor: dir.Find(x=>x.Item==a).Item2.Find(x=>x.Item==b).Item2
     Result: over60s

Are there better/faster types? any recommendations about improving the performance? or is there a quite different solution state-of-the-art?

How much is a ton of data ? Do you have 100 keys ? 100 000 000 ? — nos, Dec 06 '15 at 02:41
@Cyral how should I use `Hashset` ? should I try `Hashset ?` — leAthlon, Dec 06 '15 at 02:45
@leAthlon Nevermind, for some reason I thought a HashSet was like a dictionary and not a list. In this case, you will need something like a dictionary because it is designed for lookups, using a list will be much slower (as you can tell), because you are trying to use it as a dictionary. — Cyral, Dec 06 '15 at 02:47
1000-10000 values is a small amount. So you can use a dictionary. But if you are looking at millions of values, you will have to use something different and have a different approach to process them. — Kosala W, Dec 06 '15 at 02:57
I'd at least try out making a custom struct similar to the 211ms `Tuple` option with an overloaded/optimized `GetHashCode()` implementation. I don't really expect it to work, but you should be able to approach the 70ms timeframe with the 211ms semantics. — Joel Coehoorn, Dec 06 '15 at 03:03

Ruslan · Accepted Answer · 2015-12-06T04:38:36.433

2

Technically, your second method (Dictionary<long,object>) should be the fastest. I think that the biggest issue is the overhead of the method call (which might not be getting inlined by the JIT). There is also a faster way to get the combined long from the two int values.

Instead of doing this

Accessor: dir[merge(a,b)]

Try this:

Accessor: dir[(((ulong)(uint)a) << 32) | (uint) b)]

Also, according to @JoelCoehoorn's comment below, you can extract the merging process into a separate method and force inlining using the following attribute: https://msdn.microsoft.com/en-us/library/system.runtime.compilerservices.methodimploptions(v=VS.110).aspx. However, I'd run the benchmark using both ways just to make sure there are no side-effects.

Please post the speed result in the comments; I'd be very curious to see.

edited Dec 06 '15 at 04:38

answered Dec 06 '15 at 02:59

Ruslan

2,691
1
19
29

1

He can still use a function with the new (.Net 4.5) inlining hint: http://stackoverflow.com/questions/473782/inline-functions-in-c – Joel Coehoorn Dec 06 '15 at 03:00
@JoelCoehoorn Good point! I wasn't aware of that. Answer updated. – Ruslan Dec 06 '15 at 03:01
1

That should be `dir[(((ulong)(uint)a) << 32) | (uint) b)]`. Also the code in the question was wrong because it only shifted by 31 instead of 32. `(int.MaxValue + 1)` vs `(uint.MaxValue + 1)` – Ben Voigt Dec 06 '15 at 03:55
@BenVoigt Corrected, thanks! As for the second point, yes, I've noticed that too, but I suppose it bears little relevance since he was only benchmarking and not doing actual lookups; further, that's the part rewritten in the answer anyway. – Ruslan Dec 06 '15 at 04:39

What is the fastest way to store information with two integer keys?

1 Answers1