99

I have some objects in List, let's say List<MyClass> and MyClass has several properties. I would like to create an index of the list based on 3 properties of of MyClass. In this case 2 of the properties are int's, and one property is a datetime.

Basically I would like to be able to do something like:

Dictionary< CompositeKey , MyClass > MyClassListIndex = Dictionary< CompositeKey , MyClass >();
//Populate dictionary with items from the List<MyClass> MyClassList
MyClass aMyClass = Dicitonary[(keyTripletHere)];

I sometimes create multiple dictionaries on a list to index different properties of the classes it holds. I am not sure how best to handle composite keys though. I considered doing a checksum of the three values but this runs the risk of collisions.

AaronLS
  • 37,329
  • 20
  • 143
  • 202

10 Answers10

125

You should use tuples. They are equivalent to a CompositeKey class, but the Equals() and GetHashCode() are already implemented for you.

var myClassIndex = new Dictionary<Tuple<int, bool, string>, MyClass>();
//Populate dictionary with items from the List<MyClass> MyClassList
foreach (var myObj in myClassList)
    myClassIndex.Add(Tuple.Create(myObj.MyInt, myObj.MyBool, myObj.MyString), myObj);
MyClass myObj = myClassIndex[Tuple.Create(4, true, "t")];

Or using System.Linq

var myClassIndex = myClassList.ToDictionary(myObj => Tuple.Create(myObj.MyInt, myObj.MyBool, myObj.MyString));
MyClass myObj = myClassIndex[Tuple.Create(4, true, "t")];

Unless you need to customize the computation of the hash, it's simpler to use tuples.

If there are a lot of properties you want to include in the composite key, the Tuple type name can become pretty long, but you can make the name shorter by creating your own class deriving from Tuple<...>.


** edited in 2017 **

There is a new option starting with C# 7: the value tuples. The idea is the same, but the syntax is different, lighter:

The type Tuple<int, bool, string> becomes (int, bool, string), and the value Tuple.Create(4, true, "t") becomes (4, true, "t").

With value tuples, it also becomes possible to name the elements. Note that performances are slightly different, so you may want to do some benchmarking if they matter for you.

Eldritch Conundrum
  • 8,452
  • 6
  • 42
  • 50
  • 4
    Tuple is not a good candidate for a key as it creates a high number of hash collisions. http://stackoverflow.com/questions/12657348/new-keyvaluepairuint32-uint32i-j-gethashcode-high-rate-of-duplicates – paparazzo Jan 13 '14 at 14:47
  • 1
    @Blam `KeyValuePair` and other structs have a default hash function that is known to be bad (see http://stackoverflow.com/questions/3841602/why-is-valuetype-gethashcode-implemented-like-it-is for more details). `Tuple<>` however is not a ValueType, and its default hash function at least will use all the fields. That being said, if the main problem of your code is collisions, then do implement an optimized `GetHashCode()` that suits your data. – Eldritch Conundrum Jan 13 '14 at 16:02
  • 1
    Even though Tuple is not a ValueType from my testing it suffers from a a lot of collisions – paparazzo Jan 13 '14 at 16:21
  • Best solution for multiple keys as far as I know. Very short and convenient. How to check if there is value for given 2 keys? –  Dec 11 '15 at 06:27
  • Thanks this is insane fast compared to Linq. I am querying 100000 object in memory that has a composite key consisting of int and DateTime. without index 23 seconds. with index 0.93 second. – Martin Andersen Jun 21 '16 at 21:12
  • Does anybody know how do anonymous types compare to tuples for this scenario? – Luiso Feb 14 '17 at 20:35
  • 5
    I think this answer is out of date now that we have ValueTuples. They have nicer syntax in C#, and they seem to do GetHashCode twice as fast as Tuples -- https://gist.github.com/ljw1004/61bc96700d0b03c17cf83dbb51437a69 – Lucian Wischik Apr 20 '17 at 15:03
  • 3
    @LucianWischik Thank you, I have updated the answer to mention them. – Eldritch Conundrum Apr 21 '17 at 15:16
  • It's so annoying that `KeyedCollection` is abstract. – Shimmy Weitzhandler Jun 06 '17 at 06:58
24

The best way I could think of is to create a CompositeKey struct and make sure to override the GetHashCode() and Equals() methods in order to ensure speed and accuracy when working with the collection:

class Program
{
    static void Main(string[] args)
    {
        DateTime firstTimestamp = DateTime.Now;
        DateTime secondTimestamp = firstTimestamp.AddDays(1);

        /* begin composite key dictionary populate */
        Dictionary<CompositeKey, string> compositeKeyDictionary = new Dictionary<CompositeKey, string>();

        CompositeKey compositeKey1 = new CompositeKey();
        compositeKey1.Int1 = 11;
        compositeKey1.Int2 = 304;
        compositeKey1.DateTime = firstTimestamp;

        compositeKeyDictionary[compositeKey1] = "FirstObject";

        CompositeKey compositeKey2 = new CompositeKey();
        compositeKey2.Int1 = 12;
        compositeKey2.Int2 = 9852;
        compositeKey2.DateTime = secondTimestamp;

        compositeKeyDictionary[compositeKey2] = "SecondObject";
        /* end composite key dictionary populate */

        /* begin composite key dictionary lookup */
        CompositeKey compositeKeyLookup1 = new CompositeKey();
        compositeKeyLookup1.Int1 = 11;
        compositeKeyLookup1.Int2 = 304;
        compositeKeyLookup1.DateTime = firstTimestamp;

        Console.Out.WriteLine(compositeKeyDictionary[compositeKeyLookup1]);

        CompositeKey compositeKeyLookup2 = new CompositeKey();
        compositeKeyLookup2.Int1 = 12;
        compositeKeyLookup2.Int2 = 9852;
        compositeKeyLookup2.DateTime = secondTimestamp;

        Console.Out.WriteLine(compositeKeyDictionary[compositeKeyLookup2]);
        /* end composite key dictionary lookup */
    }

    struct CompositeKey
    {
        public int Int1 { get; set; }
        public int Int2 { get; set; }
        public DateTime DateTime { get; set; }

        public override int GetHashCode()
        {
            return Int1.GetHashCode() ^ Int2.GetHashCode() ^ DateTime.GetHashCode();
        }

        public override bool Equals(object obj)
        {
            if (obj is CompositeKey)
            {
                CompositeKey compositeKey = (CompositeKey)obj;

                return ((this.Int1 == compositeKey.Int1) &&
                        (this.Int2 == compositeKey.Int2) &&
                        (this.DateTime == compositeKey.DateTime));
            }

            return false;
        }
    }
}

An MSDN article on GetHashCode():

http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx

Jon
  • 122
  • 12
  • I don't think that's actually 100% certain to be a unique hashcode, just very likely. – Hans Olsson May 20 '10 at 21:02
  • That may very well be true! According to the MSDN article linked, that is the recommended way to override GetHashCode(). However, since I don't use a lot of composite keys in my day-to-day work, I cannot say for certain. – Allen E. Scharfenberg May 20 '10 at 21:04
  • GetHashCode just gets you in the neighborhood - lets the implementation reduce the number of keys it needs to test for an exact match. – Jason Kleban May 20 '10 at 21:05
  • Jason, then perhaps Equals() should be overridden as well? To test for the exact match? If so, I will adjust my original code sample to include that. – Allen E. Scharfenberg May 20 '10 at 21:11
  • @Jason So, if I understand correctly, if there is a collision for the hashcode, the Dictionary still uses equality comparison to ensure it is selecting the correct object out of the two objects sharing a hashcode? – AaronLS May 20 '10 at 22:01
  • 4
    Yes. If you disassemble Dictionary.FindEntry() with Reflector, you'll see that both the hashcode AND the full equality are tested. The hashcode is tested first and, if it fails, short-circuits the condition without checking the full equality. If the hash passes, the equality is tested too. – Jason Kleban May 20 '10 at 23:14
  • 1
    And yes, equals should also be overridden to match. Even if you made GetHashCode() return 0 for any instance, Dictionary would still work, it'd just be slower. – Jason Kleban May 20 '10 at 23:19
  • I have added the Equals() override to the code snippit above per comments. – Allen E. Scharfenberg May 21 '10 at 13:11
  • 2
    The builtin Tuple type implements hash combination as '(h1 << 5) + h1 ^ h2' instead of your 'h1 ^ h2'. I guess they do that to avoid collisions everytime the two objects to hash are equal to the same value. – Eldritch Conundrum Mar 14 '12 at 10:41
15

How about Dictionary<int, Dictionary<int, Dictionary<DateTime, MyClass>>>?

This would allow you to do:

MyClass item = MyData[8][23923][date];
Jason Kleban
  • 20,024
  • 18
  • 75
  • 125
  • 1
    this will create a lot more object then uisng a CompositeKey struct or class. and will also be slower as two level of lookup will be used. – Ian Ringrose Aug 19 '10 at 08:44
  • I believe it is the same number of comparisons - I don't see how there'd be many more objects - composite key way still needs a key, and it's component values or objects and one dict to hold them. This nested way, you don't need the wrapper key for each object/value, one additional dict for each additional nest level. What do you think? – Jason Kleban Aug 19 '10 at 16:44
  • 11
    Based on my benchmarking, which I tried with keys with 2 and 3 parts: a nested dictionary solution is a 3-4x faster than using a tuple composite key approach. However, the tuple approach is a lot easier/tidier. – RickL May 31 '12 at 16:13
  • 5
    @RickL I can confirm those benchmarks, we use a type in our code-base, called `CompositeDictionary` (etc) which simply inherits from `Dictionary>` (or however many nested dictionaries are required. Without implementing the entire type from the ground up ourselves (instead of cheating using nested dictionaries or types to contain the keys) this is the fastest we get. – Adam Houldsworth Oct 02 '12 at 07:51
  • 1
    Nested dict approach should be faster only for half (?) the cases where the data is not present, since the intermediate dictionaries can bypass the full hash code computation & comparison. In presence of data, it should be slower since basic operations like Add, Contains etc should be performed thrice. Im sure the margin with tuple approach is beaten in some of the benchmarks mentioned above is about the implementation detail of .NET tuples which is quite poor considerin the boxing penalty it brings for value types. A properly implemented triplet is what I would go with, considering memory too – nawfal Jan 13 '14 at 01:16
  • Ignoring the type mess, this is by far the best approach. Aside from the performance gains, this also allows indexed searches where you only know the first, or the first and the second key, returning a fraction of the collection to iterate later on. – Guilherme Jan 22 '19 at 17:26
13

You can store them in a struct and use that as the key:

struct CompositeKey
{
  public int value1;
  public int value2;
  public DateTime value3;
}

Link to get hash code: http://msdn.microsoft.com/en-us/library/system.valuetype.gethashcode.aspx

kemiller2002
  • 113,795
  • 27
  • 197
  • 251
  • I'm stuck on .NET 3.5 so I don't have access to `Tuple`s so this is a good solution! – aarona Mar 28 '12 at 21:41
  • I'm surprised this is not more upvoted. It's a simple solution that is more readable than a Tuple. – Mark Feb 04 '13 at 23:20
  • 1
    According to msdn this performs ok, if no fields are reference types, otherwise it uses reflection for equality. – Gregor Slavec Apr 24 '13 at 07:49
  • @Mark The problem with a struct is that its default GetHashCode() implementation actually does not guarantee to use all the fields of the struct (leading to poor dictionary performance), whereas Tuple offer such guarantee. I have tested it. See http://stackoverflow.com/questions/3841602/why-is-valuetype-gethashcode-implemented-like-it-is for gory details. – Eldritch Conundrum Jan 08 '14 at 13:12
8

Now that VS2017/C#7 has come out, the best answer is to use ValueTuple:

// declare:
Dictionary<(string, string, int), MyClass> index;

// populate:
foreach (var m in myClassList) {
  index[(m.Name, m.Path, m.JobId)] = m;
}

// retrieve:
var aMyClass = index[("foo", "bar", 15)];

I chose to declare the dictionary with an anonymous ValueTuple (string, string, int). But I could have given them names (string name, string path, int id).

Perfwise, the new ValueTuple is faster than Tuple at GetHashCode but slower at Equals. I think you'd need to do complete end-to-end experiments to figure out which is really fastest for your scenario. But the end-to-end niceness and language syntax for ValueTuple makes it win out.

// Perf from https://gist.github.com/ljw1004/61bc96700d0b03c17cf83dbb51437a69
//
//              Tuple ValueTuple KeyValuePair
//  Allocation:  160   100        110
//    Argument:   75    80         80    
//      Return:   75   210        210
//        Load:  160   170        320
// GetHashCode:  820   420       2700
//      Equals:  280   470       6800
Kolja
  • 2,667
  • 1
  • 30
  • 29
Lucian Wischik
  • 2,160
  • 1
  • 20
  • 25
  • Yeah, I went through a big re-write just to have the Anonymous Type solution blow up in my face (can't compare anonymous types created with different assemblies). The ValueTuple seems to be a relatively elegant solution to the problem of compound dictionary keys. – Quark Soup Jan 03 '19 at 23:21
5

Two approaches immediately spring to mind:

  1. Do as Kevin suggested and write a struct that will serve as your key. Be sure to make this struct implement IEquatable<TKey> and to override its Equals and GetHashCode methods*.

  2. Write a class that utilizes nested dictionaries internally. Something like: TripleKeyDictionary<TKey1, TKey2, TKey3, TValue>... this class would internally have a member of type Dictionary<TKey1, Dictionary<TKey2, Dictionary<TKey3, TValue>>>, and would expose methods such as this[TKey1 k1, TKey2 k2, TKey3 k3], ContainsKeys(TKey1 k1, TKey2 k2, TKey3 k3), etc.

*A word on whether overriding the Equals method is necessary: while it's true that the Equals method for a struct compares the value of each member by default, it does so by using reflection -- which inherently entails performance costs -- and is therefore not a very appropriate implementation for something that is meant to be used as a key in a dictionary (in my opinion, anyway). According to the MSDN documentation on ValueType.Equals:

The default implementation of the Equals method uses reflection to compare the corresponding fields of obj and this instance. Override the Equals method for a particular type to improve the performance of the method and more closely represent the concept of equality for the type.

Dan Tao
  • 125,917
  • 54
  • 300
  • 447
  • Regarding 1, I don't think yuo need to override Equals and GetHashcode, the default implementation of Equals will automatically check for equality on all fields which I think should be ok on this struct. – Hans Olsson May 20 '10 at 20:57
  • @ho: It may not be *necessary*, but I would strongly advise doing so for any struct that's going to serve as a key. See my edit. – Dan Tao May 20 '10 at 23:28
3

If the key is part of the class then use KeyedCollection.
It is a Dictionary where the key is derived from the object.
Under the covers it is Dictionary
Don't have to repeat the key in the Key and Value.
Why take a chance the key is not the same in the Key as the Value.
Don't have to duplicate the same information in memory.

KeyedCollection Class

Indexer to expose the composite key

    using System.Collections.ObjectModel;

    namespace IntIntKeyedCollection
    {
        class Program
        {
            static void Main(string[] args)
            {
                Int32Int32DateO iid1 = new Int32Int32DateO(0, 1, new DateTime(2007, 6, 1, 8, 30, 52));
                Int32Int32DateO iid2 = new Int32Int32DateO(0, 1, new DateTime(2007, 6, 1, 8, 30, 52));
                if (iid1 == iid2) Console.WriteLine("same");
                if (iid1.Equals(iid2)) Console.WriteLine("equals");
                // that are equal but not the same I don't override = so I have both features

                Int32Int32DateCollection int32Int32DateCollection = new Int32Int32DateCollection();
                // dont't have to repeat the key like Dictionary
                int32Int32DateCollection.Add(new Int32Int32DateO(0, 0, new DateTime(2008, 5, 1, 8, 30, 52)));
                int32Int32DateCollection.Add(new Int32Int32DateO(0, 1, new DateTime(2008, 6, 1, 8, 30, 52)));
                int32Int32DateCollection.Add(iid1);
                //this would thow a duplicate key error
                //int32Int32DateCollection.Add(iid2);
                //this would thow a duplicate key error
                //int32Int32DateCollection.Add(new Int32Int32DateO(0, 1, new DateTime(2008, 6, 1, 8, 30, 52)));
                Console.WriteLine("count");
                Console.WriteLine(int32Int32DateCollection.Count.ToString());
                // reference by ordinal postion (note the is not the long key)
                Console.WriteLine("oridinal");
                Console.WriteLine(int32Int32DateCollection[0].GetHashCode().ToString());
                // reference by index
                Console.WriteLine("index");
                Console.WriteLine(int32Int32DateCollection[0, 1, new DateTime(2008, 6, 1, 8, 30, 52)].GetHashCode().ToString());
                Console.WriteLine("foreach");
                foreach (Int32Int32DateO iio in int32Int32DateCollection)
                {
                    Console.WriteLine(string.Format("HashCode {0} Int1 {1} Int2 {2} DateTime {3}", iio.GetHashCode(), iio.Int1, iio.Int2, iio.Date1));
                }
                Console.WriteLine("sorted by date");
                foreach (Int32Int32DateO iio in int32Int32DateCollection.OrderBy(x => x.Date1).ThenBy(x => x.Int1).ThenBy(x => x.Int2))
                {
                    Console.WriteLine(string.Format("HashCode {0} Int1 {1} Int2 {2} DateTime {3}", iio.GetHashCode(), iio.Int1, iio.Int2, iio.Date1));
                }
                Console.ReadLine();
            }
            public class Int32Int32DateCollection : KeyedCollection<Int32Int32DateS, Int32Int32DateO>
            {
                // This parameterless constructor calls the base class constructor 
                // that specifies a dictionary threshold of 0, so that the internal 
                // dictionary is created as soon as an item is added to the  
                // collection. 
                // 
                public Int32Int32DateCollection() : base(null, 0) { }

                // This is the only method that absolutely must be overridden, 
                // because without it the KeyedCollection cannot extract the 
                // keys from the items.  
                // 
                protected override Int32Int32DateS GetKeyForItem(Int32Int32DateO item)
                {
                    // In this example, the key is the part number. 
                    return item.Int32Int32Date;
                }

                //  indexer 
                public Int32Int32DateO this[Int32 Int1, Int32 Int2, DateTime Date1]
                {
                    get { return this[new Int32Int32DateS(Int1, Int2, Date1)]; }
                }
            }

            public struct Int32Int32DateS
            {   // required as KeyCollection Key must be a single item
                // but you don't really need to interact with Int32Int32DateS directly
                public readonly Int32 Int1, Int2;
                public readonly DateTime Date1;
                public Int32Int32DateS(Int32 int1, Int32 int2, DateTime date1)
                { this.Int1 = int1; this.Int2 = int2; this.Date1 = date1; }
            }
            public class Int32Int32DateO : Object
            {
                // implement other properties
                public Int32Int32DateS Int32Int32Date { get; private set; }
                public Int32 Int1 { get { return Int32Int32Date.Int1; } }
                public Int32 Int2 { get { return Int32Int32Date.Int2; } }
                public DateTime Date1 { get { return Int32Int32Date.Date1; } }

                public override bool Equals(Object obj)
                {
                    //Check for null and compare run-time types.
                    if (obj == null || !(obj is Int32Int32DateO)) return false;
                    Int32Int32DateO item = (Int32Int32DateO)obj;
                    return (this.Int32Int32Date.Int1 == item.Int32Int32Date.Int1 &&
                            this.Int32Int32Date.Int2 == item.Int32Int32Date.Int2 &&
                            this.Int32Int32Date.Date1 == item.Int32Int32Date.Date1);
                }
                public override int GetHashCode()
                {
                    return (((Int64)Int32Int32Date.Int1 << 32) + Int32Int32Date.Int2).GetHashCode() ^ Int32Int32Date.GetHashCode();
                }
                public Int32Int32DateO(Int32 Int1, Int32 Int2, DateTime Date1)
                {
                    Int32Int32DateS int32Int32Date = new Int32Int32DateS(Int1, Int2, Date1);
                    this.Int32Int32Date = int32Int32Date;
                }
            }
        }
    }

As for using value type fpr the key Microsoft specifically recommends against it.

ValueType.GetHashCode

Tuple is technically not a value type but suffers from the same symptom (hash collisions) and is not good candidate for a key.

d219
  • 2,707
  • 5
  • 31
  • 36
paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • +1 for a more correct answer. Surprised nobody mentioned it earlier. In fact depending on how OP intents to use the structure, `HashSet` with an appropriate `IEqualityComparer` would be an option too. Btw, I think your answer will attract up votes if you can change your class names and other member names :) – nawfal Jan 13 '14 at 01:18
2

May I suggest an alternative - a anonymous object. It's the same we use in GroupBy LINQ method with multiple keys.

var dictionary = new Dictionary<object, string> ();
dictionary[new { a = 1, b = 2 }] = "value";

It may looks strange, but I've benchmarked Tuple.GetHashCode and new{ a = 1, b = 2 }.GetHashCode methods and the anonymous objects wins on my machine on .NET 4.5.1:

Object - 89,1732 ms for 10000 calls in 1000 cycles

Tuple - 738,4475 ms for 10000 calls in 1000 cycles

Michael Logutov
  • 2,551
  • 4
  • 28
  • 32
  • omg, this alternative was never on my mind... I don't know if it will behave well if you use a complex-type as composite key. – Gabriel Espinoza May 12 '15 at 22:01
  • If you simply pass an object (instead of anonymous one) the result of GetHashCode method of this object will be used. If you use it like `dictionary[new { a = my_obj, b = 2 }]` then the resulting hash code will be combination of my_obj.GetHashCode and ((Int32)2).GetHashCode. – Michael Logutov May 13 '15 at 07:31
  • DON'T USE THIS METHOD! Different assemblies create different names for anonymous types. While it looks anonymous to you, behind the scenes there's a concrete class created and two objects of two different classes will not be equal with the default operator. – Quark Soup Jan 03 '19 at 22:38
  • And how does that matters in this case? – Michael Logutov Jan 04 '19 at 12:04
0

Another solution to the ones already mentioned would be to store some kind of list of all keys generated so far and when a new object is generated you generate it's hashcode (just as a starting point), check if it's already in the list, if it is, then add some random value etc to it until you've got a unique key, then store that key in the object itself and in the list and return that as the key at all times.

Hans Olsson
  • 54,199
  • 15
  • 94
  • 116
0

As an alternative:

Maybe this will help somebody with this necessity.

One option would be to use strings as a Composite Key for the dictionary. Example:

var myDict = new Dictionary<string, bool>();
myDict.Add($"{1}-{1111}-{true}", true);
myDict.Add($"{1}-{1111}-{false}", false);

This way you can store keys with any format. If you want you can always define a function that builds your key as this:

string BuildKey(int number, string name, bool disabled) => $"{number}-{name}-{disabled}";