Tuple vs string as a Dictionary key in C#

Question

I have a cache that I implement using a ConcurrentDictionary, The data that I need to keep depends on 5 parameters. So the Method to get it from the cache is: (I show only 3 parameters here for simplicity, and I changed the data type to represent CarData for clearity)

public CarData GetCarData(string carModel, string engineType, int year);

I wonder what type of key will be better to use in my ConcurrentDictionary, I can do it like this:

var carCache = new ConcurrentDictionary<string, CarData>();
// check for car key
bool exists = carCache.ContainsKey(string.Format("{0}_{1}_{2}", carModel, engineType, year);

Or like this:

var carCache = new ConcurrentDictionary<Tuple<string, string, int>, CarData>();
// check for car key
bool exists = carCache.ContainsKey(new Tuple(carModel, engineType, year));

I don't use these parameters together any other place, so there is no justification to create a class just to keep them together.

I want to know which approach is a better in terms of performance and maintainability.

If you're talking about maintainability and readability, I'd say you'd still create a class for the parameters/key with a custom comparer. If you use a class, you only have to edit said class, and you won't have to edit two or more places in your code. That's all my opinion though. If I'd have to choose between your two options, I'd go for the tuple as key. It's worse performance-wise, but it's easier to understand/maintain. — , Jan 30 '17 at 09:36
I prefer performance over understanding, when the cahce will hold hundreds of keys and get them slowly, no one will care for code that looks better... — Shahar, Jan 30 '17 at 09:39
My company always states the following: "You'll write your code once, but your code will be read a thousand times." It's all up to you though, I just find that readability is important for code I write :) — , Jan 30 '17 at 09:40
Just as RandomStranger, I would prefer a class for maintainability, but also for performance. Inside your own class, you could override `GetHashCode` based on the most unique value and short circuit the `Equals` based on comparing the most unique values first. `GetHashCode` should be as simple as possible while still returning a value that is as unique as can be for optimal dictionary performance — Me.Name, Jan 30 '17 at 09:41
I find it odd that you would need 5 different keys for the same dictionary. — o_weisman, Jan 30 '17 at 09:42
If you use string concatenation then you'll need to make very sure that you don't duplicate keys. If any of your strings contain _ then they can be immediately ambiguous. It may of course be that this can't happen with the format of your keys but it is something to be aware of and why I would be more inclined towards Tuples than strings. — Chris, Jan 30 '17 at 09:43
@o_weisman: I've come across similar situations where the code is basically caching an expensive method call in a dictionary. The dictionary key needs to be made up of all of the paramaters to the method. — Chris, Jan 30 '17 at 09:45
My Moto is: "Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." :) — Shahar, Jan 30 '17 at 09:46
@Chris: Thanks for the idea for the UnitTest, the strings doesn't contain '_', this is why I chose it in the string key — Shahar, Jan 30 '17 at 09:55
The strings may not contain _ now, but can you say for certain that they never will in future? Using a tuple, or a custom class, prevents this breaking in future if the possible contents of the keys changes. — Andy Nichols, Jan 30 '17 at 12:20
_"I don't use these parameters together any other place, so there is no justification to create a class"_ - What a horrible misconception. I hope that is not common. — BlueRaja - Danny Pflughoeft, Jan 30 '17 at 23:36
@BlueRaja-DannyPflughoeft: The parameters that are sent to this method are part of different classes that exist in the code. This method is the only place that I use them together. I think that I will arrange the classes of this code differently to fix this disorder. You can tell me your conception if you think I'm so wrong... Thanks for the honesty — Shahar, Jan 31 '17 at 09:19
I took everyones advices and used Tim's implementation using a class. The code is now clearer and I'm happy with the result — Shahar, Feb 15 '17 at 11:39
As a side note, the `ConcurrentDictionary` class is only suitable for very trivial caching scenarios. For anything more advanced (like expiration policies and such), there are specialized classes available. Like the [`System.Runtime.Caching.MemoryCache`](https://learn.microsoft.com/en-us/dotnet/api/system.runtime.caching.memorycache) (with `string` keys), and the newer [`Microsoft.Extensions.Caching.Memory.MemoryCache`](https://learn.microsoft.com/en-us/dotnet/api/microsoft.extensions.caching.memory.memorycache) (with `object` keys). The later offers more prioritization options. — Theodor Zoulias, Apr 05 '20 at 18:21

InBetween · Answer 1 · 2017-01-30T14:36:10.993

I want to know which approach is a better in terms of performance and maintainability.

As always, you have the tools to figure it out. Code both possible solutions and make them race. The one that wins is the winner, you don't need anyone here to answer this particular question.

About maintenance, the solution that autodocuments itself better and has better scalability should be the winner. In this case, the code is so trivial that autodocumentation isn't that much of an issue. From a scalability point of view, IMHO, the best solution is to use Tuple<T1, T2, ...>:

You get free equality semantics which you don't need to maintain.
Collisions are not possible, something that is not true if you choose the string concatenation solution:
```
var param1 = "Hey_I'm a weird string";
var param2 = "!"
var param3 = 1;
key = "Hey_I'm a weird string_!_1";

var param1 = "Hey";
var param2 = "I'm a weird string_!"
var param3 = 1;
key = "Hey_I'm a weird string_!_1";
```
Yeah, far fetched, but, in theory, entirely possible and your question is precisely about unknown events in the future, so...
And last, but not least, the compiler helps you maintain the code. If, for example, tomorrow you have to add param4 to your key, Tuple<T1, T2, T3, T4> will strongly type your key. Your string concatenation algorithm on the other hand can live on blissfully happy generating keys without param4 and you wont know whats happening until your client calls you up because their software is not working as expected.

+1 All good points, and a sensible discussion given the question, though I'd still be inclined to write my own type if anyone else ever had to use the code. — VisualMelon, Jan 30 '17 at 12:33
@VisualMelon Thnks. Yes, I agree completely but there is already an upvoted question addressing that particular solution. I just focused on answering the exact question. I'd also probably implement a private nested class to handle the key even though it *does* add an additional maintenance cost. — InBetween, Jan 30 '17 at 12:37
Aye, I think there is great value in both kinds of answers, certainly no slight intended on my part, your answer is very good as it stands. — VisualMelon, Jan 30 '17 at 12:40
This works, but a bit slower than creating a type. Definitely better than a string key. http://stackoverflow.com/a/41938199/1176983 — Tomer, Jan 30 '17 at 14:42

svick · Answer 2 · 2017-08-16T13:26:52.743

If performance is really important, then the answer is that you shouldn't use either option, because both unnecessarily allocate an object on every access.

Instead, you should use a struct, either a custom one, or ValueTuple from the System.ValueTuple package:

var myCache = new ConcurrentDictionary<ValueTuple<string, string, int>, CachedData>();
bool exists = myCache.ContainsKey(ValueTuple.Create(param1, param2, param3));

C# 7.0 also contais syntax sugar to make this code easier to write (but you don't need to wait for C# 7.0 to start using ValueTuple without the sugar):

var myCache = new ConcurrentDictionary<(string, string, int), CachedData>();
bool exists = myCache.ContainsKey((param1, param2, param3));

you can even name the individual parts of the key, so it becomes much like a custom class where you implemented you own Equals and GetHashCode — Jim Wolff, Aug 16 '17 at 11:02

score 14 · Accepted Answer · edited May 23 '17 at 12:09

14

You could create a class (doesn't matter that its only used here) that overrides GetHashCode and Equals:

Thanks theDmi (and others) for improvements...

public class CarKey : IEquatable<CarKey>
{
    public CarKey(string carModel, string engineType, int year)
    {
        CarModel = carModel;
        EngineType= engineType;
        Year= year;
    }

    public string CarModel {get;}
    public string EngineType {get;}
    public int Year {get;}

    public override int GetHashCode()
    {
        unchecked // Overflow is fine, just wrap
        {
            int hash = (int) 2166136261;

            hash = (hash * 16777619) ^ CarModel?.GetHashCode() ?? 0;
            hash = (hash * 16777619) ^ EngineType?.GetHashCode() ?? 0;
            hash = (hash * 16777619) ^ Year.GetHashCode();
            return hash;
        }
    }

    public override bool Equals(object other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        if (other.GetType() != GetType()) return false;
        return Equals(other as CarKey);
    }

    public bool Equals(CarKey other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        return string.Equals(CarModel,obj.CarModel) && string.Equals(EngineType, obj.EngineType) && Year == obj.Year;
    }
}

If you don't override those, ContainsKey does a reference equals.

Note: the Tuple class does have its own equality functions that would basically do the same as above. Using a bespoke class makes it clear that is what is intended to happen - and is therefore better for maintainability. It also has the advantage that you can name the properties so it is clear

Note 2: the class is immutable as dictionary keys need to be to avoid potential bugs with hashcodes changing after the object is added to the dictionary See here

GetHashCode taken from here

edited May 23 '17 at 12:09

Community

1
1

answered Jan 30 '17 at 09:39

Tim Rutter

4,549
3
23
47

4

Although this answer is good, I'd recommend that `Key` implements `IEquatable` which autodocuments better the type's value semantics of the equalitiy check. – InBetween Jan 30 '17 at 09:44
Doesn't `Tuple` already implement comparison logic? – Matteo Umili Jan 30 '17 at 09:51
@MatteoUmili Yes it does - the above approach gives full control of the comparison and I think in the case of using an object as a key its important that its clear what comparison is being performed. The question was which is best for maintainability - I think the above is better for that. – Tim Rutter Jan 30 '17 at 09:56
3

I don't quite understand how having a redundant type is better for maintainability. Your example has nothing over the tuple. What do you need "full control over comparison" for? You conveniently skipped a quite important thing which is implementation of `GetHashCode`. It seems like explicitly reimplementing a stub of tuple, just for being verbose. Introducing unnecessary entities into architecture is not a good thing. This approach is fine, but only if the existing standard types would be not sufficient. – luk32 Jan 30 '17 at 11:22
4

This class REALLY should be immutable. – theDmi Jan 30 '17 at 11:23
@luk32 I see your point. I guess the reason I post the above is that there may be more to the comparison than just Param1 == other.param1 etc. Only the OP knows the full context. Yes I agree that in a simple case Tuple does everything that is needed. The advantage of having a separate class for it is that it is clear that the simple case was the intended method of comparison. From a maintability point of view, should the "key" be extended in future and have a member added where a simple equals comparison is no longer valid, then this would not be obvious with Tuple. – Tim Rutter Jan 30 '17 at 11:47
@theDmi perhaps you could explain the reason for your post? – Tim Rutter Jan 30 '17 at 11:48
@TimRutter Mutable dictionary keys lead to hard to catch bugs, see http://stackoverflow.com/a/15439203/219187 for a very good explanation why. – theDmi Jan 30 '17 at 11:51
@theDmi Thanks for that. Yes I see why mutability would be bad. The above is just food for thought so I won't update my answer. Tuple would have the same issue. – Tim Rutter Jan 30 '17 at 11:54
1

Actually that's not the case, tuples **are** immutable. – theDmi Jan 30 '17 at 11:57
Ok I've updated my answer to be complete. I don't tend to use Tuples as I find them syntactically unpleasant. Thanks for raising the point of immutability though. – Tim Rutter Jan 30 '17 at 12:00
Your `GetHashCode` trivially collides parameter sets where `param1 == param2`. This may or may not be an issue in practice, but it's worth noting. (Methods to avoid this include multiply-add hashes with small constants.) `Tuple` at least avoids this natively. – Jeroen Mostert Jan 30 '17 at 13:41
1

@JeroenMostert thanks, I wasn't trying to create the full implementation but I've updated my answer, the more I update it the more questions it throws up - learnt a few things though so hopefully useful to others as well. – Tim Rutter Jan 30 '17 at 14:06
1

Yes, it illustrates why `Tuple` is a good idea at least from a productivity standpoint. Getting all these details right for your own separate class is a chore. :-) – Jeroen Mostert Jan 30 '17 at 14:08
In my comparison, this is the fastest way: http://stackoverflow.com/a/41938199/1176983 – Tomer Jan 30 '17 at 14:41
@luk32 The big benefit of this approach is that the parameters can be named. Of course, _we_ can't do that because OP never told us what the parameters are for. – BlueRaja - Danny Pflughoeft Jan 30 '17 at 23:37

theDmi · Answer 4 · 2017-01-30T12:35:37.750

Implement a custom key class and make sure it is suitable for such use cases, i.e. implement IEquatable and make the class immutable:

public class CacheKey : IEquatable<CacheKey>
{
    public CacheKey(string param1, string param2, int param3)
    {
        Param1 = param1;
        Param2 = param2;
        Param3 = param3;
    }

    public string Param1 { get; }

    public string Param2 { get; }

    public int Param3 { get; }

    public bool Equals(CacheKey other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        return string.Equals(Param1, other.Param1) && string.Equals(Param2, other.Param2) && Param3 == other.Param3;
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != GetType()) return false;
        return Equals((CacheKey)obj);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            var hashCode = Param1?.GetHashCode() ?? 0;
            hashCode = (hashCode * 397) ^ (Param2?.GetHashCode() ?? 0);
            hashCode = (hashCode * 397) ^ Param3;
            return hashCode;
        }
    }
}

This is a GetHashCode() implementation how Resharper generates it. It is a good general-purpose implementation. Adapt as required.

Alternatively, use something like Equ (I'm the creator of that library) that automatically generates Equals and GetHashCode implementations. This will make sure that these methods always include all members of the CacheKey class, so the code becomes much easier to maintain. Such an implementation would then simply look like this:

public class CacheKey : MemberwiseEquatable<CacheKey>
{
    public CacheKey(string param1, string param2, int param3)
    {
        Param1 = param1;
        Param2 = param2;
        Param3 = param3;
    }

    public string Param1 { get; }

    public string Param2 { get; }

    public int Param3 { get; }
}

Note: You should obviously use meaningful property names, otherwise introducing a custom class does not provide much benefit over using a Tuple.

If the object is imputable, and exclusivly used as dictioanary keys, shouldn't the hash code be computed at object initialization? — Taemyr, Jan 30 '17 at 11:44
It could, but why? What you're suggesting is micro-optimization. You'll have to profile a realistic use case with both versions and see which one is faster if you really want to know the answer to this question. — theDmi, Jan 30 '17 at 11:46
I think that the main benefit of writing your own class over using `Tuple` is that `Item1` and `Item2` don't mean _anything_, the same goes for `Param1`, and `Param2`. I would suggest including a note that if you are going to the effort of creating a dedicated class for this case, that you should go the full hog and _clearly name_ each parameter, because all I see here are 2 strings and an int. Maybe you can work out what the in is, but how are you to know which string goes where? (P.S. +1 for immutable and viable `GetHashCode` implementation) — VisualMelon, Jan 30 '17 at 12:30
@VisualMelon Thanks for the note, that is of course the idea, I'll clarify the answer. Since the names in the question are meaningless, I did the same here... — theDmi, Jan 30 '17 at 12:33
What about `IStructuralEquatable` which is what I believe `Tuple<>` is? At least I believe this use case is as at least *partially* (perhaps wholly) why that interface exists — pinkfloydx33, Jan 30 '17 at 12:57
@pinkfloydx33 Theoretically yes. Unfortunately that interface provides only little benefits, see http://stackoverflow.com/q/3609823/219187 — theDmi, Jan 30 '17 at 13:07
@theDmi: Why did you add the uncheck scope to the GetHashCode method? — Shahar, Jan 30 '17 at 15:32

Tomer · Answer 5 · 2017-01-30T14:32:29.600

I wanted to compare the Tuple versus Class versus "id_id_id" approaches described in the other comments. I used this simple code:

public class Key : IEquatable<Key>
{
    public string Param1 { get; set; }
    public string Param2 { get; set; }
    public int Param3 { get; set; }

    public bool Equals(Key other)
    {
        if (ReferenceEquals(null, other)) return false;
        if (ReferenceEquals(this, other)) return true;
        return string.Equals(Param1, other.Param1) && string.Equals(Param2, other.Param2) && Param3 == other.Param3;
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != this.GetType()) return false;
        return Equals((Key) obj);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            var hashCode = (Param1 != null ? Param1.GetHashCode() : 0);
            hashCode = (hashCode * 397) ^ (Param2 != null ? Param2.GetHashCode() : 0);
            hashCode = (hashCode * 397) ^ Param3;
            return hashCode;
        }
    }
}

static class Program
{

    static void TestClass()
    {
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        var classDictionary = new Dictionary<Key, string>();

        for (var i = 0; i < 10000000; i++)
        {
            classDictionary.Add(new Key { Param1 = i.ToString(), Param2 = i.ToString(), Param3 = i }, i.ToString());
        }
        stopwatch.Stop();
        Console.WriteLine($"initialization: {stopwatch.Elapsed}");

        stopwatch.Restart();

        for (var i = 0; i < 10000000; i++)
        {
            var s = classDictionary[new Key { Param1 = i.ToString(), Param2 = i.ToString(), Param3 = i }];
        }

        stopwatch.Stop();
        Console.WriteLine($"Retrieving: {stopwatch.Elapsed}");
    }

    static void TestTuple()
    {
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        var tupleDictionary = new Dictionary<Tuple<string, string, int>, string>();

        for (var i = 0; i < 10000000; i++)
        {
            tupleDictionary.Add(new Tuple<string, string, int>(i.ToString(), i.ToString(), i), i.ToString());
        }
        stopwatch.Stop();
        Console.WriteLine($"initialization: {stopwatch.Elapsed}");

        stopwatch.Restart();

        for (var i = 0; i < 10000000; i++)
        {
            var s = tupleDictionary[new Tuple<string, string, int>(i.ToString(), i.ToString(), i)];
        }

        stopwatch.Stop();
        Console.WriteLine($"Retrieving: {stopwatch.Elapsed}");
    }

    static void TestFlat()
    {
        var stopwatch = new Stopwatch();
        stopwatch.Start();
        var tupleDictionary = new Dictionary<string, string>();

        for (var i = 0; i < 10000000; i++)
        {
            tupleDictionary.Add($"{i}_{i}_{i}", i.ToString());
        }
        stopwatch.Stop();
        Console.WriteLine($"initialization: {stopwatch.Elapsed}");

        stopwatch.Restart();

        for (var i = 0; i < 10000000; i++)
        {
            var s = tupleDictionary[$"{i}_{i}_{i}"];
        }

        stopwatch.Stop();
        Console.WriteLine($"Retrieving: {stopwatch.Elapsed}");
    }

    static void Main()
    {
        TestClass();
        TestTuple();
        TestFlat();
    }
}

Results:

I ran each method 3 times in Release without debugging, each run commenting out the calls to the other methods. I took the average of the 3 runs, but there wasn't much variance anyway.

TestTuple:

initialization: 00:00:14.2512736
Retrieving: 00:00:08.1912167

TestClass:

initialization: 00:00:11.5091160
Retrieving: 00:00:05.5127963

TestFlat:

initialization: 00:00:16.3672901
Retrieving: 00:00:08.6512009

I was surprised to see that the class approach was faster than both the tuple approach and the string approach. In my opinion it's more readable and more future-safe, in the sense that more functionality can be added to the Key class (assuming it's not just a key, it represents something).

Great job comparing. It seems like the best solution is to use a class, but all the given parameters are parts of other classes in the reset of the code... — Shahar, Jan 30 '17 at 15:25
Thanks. Even if the parameters are parts of other classes, I see no problem creating a new class. — Tomer, Jan 30 '17 at 15:34
I think it's time to arrange all the classes and structs that are used in this code, and then I will create the new class for the key. It will take some time, since this isn't my code. — Shahar, Jan 30 '17 at 15:40

score 4 · Answer 6 · answered Feb 27 '18 at 23:05

I ran Tomer's test cases, adding ValueTuples as a test case (new c# value type). Was impressed at how well they performed.

TestClass
initialization: 00:00:11.8787245
Retrieving: 00:00:06.3609475

TestTuple
initialization: 00:00:14.6531189
Retrieving: 00:00:08.5906265

TestValueTuple
initialization: 00:00:10.8491263
Retrieving: 00:00:06.6928401

TestFlat
initialization: 00:00:16.6559780
Retrieving: 00:00:08.5257845

Code for the test is below:

static void TestValueTuple(int n = 10000000)
{
    var stopwatch = new Stopwatch();
    stopwatch.Start();
    var tupleDictionary = new Dictionary<(string, string, int), string>();

    for (var i = 0; i < n; i++)
    {
        tupleDictionary.Add((i.ToString(), i.ToString(), i), i.ToString());
    }
    stopwatch.Stop();
    Console.WriteLine($"initialization: {stopwatch.Elapsed}");

    stopwatch.Restart();

    for (var i = 0; i < n; i++)
    {
        var s = tupleDictionary[(i.ToString(), i.ToString(), i)];
    }

    stopwatch.Stop();
    Console.WriteLine($"Retrieving: {stopwatch.Elapsed}");
}

score 3 · Answer 7 · answered Jan 30 '17 at 09:38

IMHO, I prefer to use in such cases some intermediate structure (in your case it will be Tuple). Such approach creates additional layer between parameters and end-target dictionary. Of course, it will be depend on purposes. Such way for example allows you to create not trivial transition of parameters (for example container may "distort" data).

Tuple vs string as a Dictionary key in C#

7 Answers7

Results: