2

I have got tons of objects, which have two Int32 Keys to identify them in a collection. The question is the choice of collection. The only criteria is pure performance for accessing/"reading" an object(no need to be writable).


Stuff I analyzed:

I didn't try hashtables cause of this. I had got the idea to try around with following types:

Dictionary<Tuple<int,int>,object>
Dictionary<long,object> // putting the bytes of two ints together by first*(int.MaxValue+1L)+second
Dictionary<int, List<Tuple<int,object>>
Dictionary<int, Dictionary<int,object>>


List<Tuple<int,int,object>>
List<Tuple<long,object>>
List<Tuple<int,Tuple<int,object>>

I filled them with 1000 values for the first and 100 values for the second integer. The "object" for testing was firstInt.ToString()+secondInt.ToString()

I measured using following pattern:

        System.Diagnostics.Stopwatch st = new System.Diagnostics.Stopwatch();
        bool b=true;
        st.Start();
        for(int i=0;i<100;i++)
        {
            for (int ii = 0; ii< 1000; ii++)
            {   //for making sure code optimization doesn't jump over it
                b=b&*accessor*.Contains('a');
            }
        }
        st.Stop();
        Console.WriteLine("Result: "+st.ElapsedMilliseconds);

I used following accessors:

     Dictionary<Tuple<int,int>,object> dir [...]
     Accessor:  dir[a][b]
     Result: 211ms

     Dictionary<long,object> dir [...]
     Accessor: dir[merge(a,b)]
     Result: 70ms

     //method for sticking two ints together
     public static long merge(int a,int b)
     { return (int.MaxValue+1L)*a+b;}

     Dictionary<int, List<Tuple<int,object>> dir[...]
     Accessor:   dir[a].Find(x=>x.Item1==b).Item2
     Result:  15ms

     Dictionary<int, Dictionary<int,object>> dir[...]
     Accessor:   dir[a][b]
     Result:  22ms

     List<Tuple<int,int,object>> dir[...]
     Accessor: dir.Find(x=>x.Item1==a&&x.Item2==b).Item3
     Result: over60s

     List<Tuple<long,object>> dir[...]
     Accessor: dir.Find(x=>x.Item1==merge(a,b)).Item2
     Result: over60s

     List<Tuple<int,Tuple<int,object>> dir[...]
     Accessor: dir.Find(x=>x.Item==a).Item2.Find(x=>x.Item==b).Item2
     Result: over60s

Are there better/faster types? any recommendations about improving the performance? or is there a quite different solution state-of-the-art?

leAthlon
  • 734
  • 1
  • 10
  • 22
  • 2
    How much is a ton of data ? Do you have 100 keys ? 100 000 000 ? – nos Dec 06 '15 at 02:41
  • @nos 2 keys per value, between 1000 and 10 000 values – leAthlon Dec 06 '15 at 02:42
  • @Cyral how should I use `Hashset` ? should I try `Hashset ?` – leAthlon Dec 06 '15 at 02:45
  • @leAthlon Nevermind, for some reason I thought a HashSet was like a dictionary and not a list. In this case, you will need something like a dictionary because it is designed for lookups, using a list will be much slower (as you can tell), because you are trying to use it as a dictionary. – Cyral Dec 06 '15 at 02:47
  • 1
    1000-10000 values is a small amount. So you can use a dictionary. But if you are looking at millions of values, you will have to use something different and have a different approach to process them. – Kosala W Dec 06 '15 at 02:57
  • I'd at least try out making a custom struct similar to the 211ms `Tuple` option with an overloaded/optimized `GetHashCode()` implementation. I don't really expect it to work, but you should be able to approach the 70ms timeframe with the 211ms semantics. – Joel Coehoorn Dec 06 '15 at 03:03

1 Answers1

2

Technically, your second method (Dictionary<long,object>) should be the fastest. I think that the biggest issue is the overhead of the method call (which might not be getting inlined by the JIT). There is also a faster way to get the combined long from the two int values.

Instead of doing this

Accessor: dir[merge(a,b)]

Try this:

Accessor: dir[(((ulong)(uint)a) << 32) | (uint) b)]

Also, according to @JoelCoehoorn's comment below, you can extract the merging process into a separate method and force inlining using the following attribute: https://msdn.microsoft.com/en-us/library/system.runtime.compilerservices.methodimploptions(v=VS.110).aspx. However, I'd run the benchmark using both ways just to make sure there are no side-effects.

Please post the speed result in the comments; I'd be very curious to see.

Ruslan
  • 2,691
  • 1
  • 19
  • 29
  • 1
    He can still use a function with the new (.Net 4.5) inlining hint: http://stackoverflow.com/questions/473782/inline-functions-in-c – Joel Coehoorn Dec 06 '15 at 03:00
  • @JoelCoehoorn Good point! I wasn't aware of that. Answer updated. – Ruslan Dec 06 '15 at 03:01
  • 1
    That should be `dir[(((ulong)(uint)a) << 32) | (uint) b)]`. Also the code in the question was wrong because it only shifted by 31 instead of 32. `(int.MaxValue + 1)` vs `(uint.MaxValue + 1)` – Ben Voigt Dec 06 '15 at 03:55
  • @BenVoigt Corrected, thanks! As for the second point, yes, I've noticed that too, but I suppose it bears little relevance since he was only benchmarking and not doing actual lookups; further, that's the part rewritten in the answer anyway. – Ruslan Dec 06 '15 at 04:39