6

I have a List<Thing> things, where a number of Things need to be frequently retrieved by looking up a combination of two variables T1 f1 and T2 f2, which are value types. They way I do that now is simply things.Where(t => t.Field1 == f1 && t.Field2 == f2). However, I do extremely many of those lookups frequently, and need a more effective method.

Fortunately, things does not need to have elements removed or added, so I thought of parsing the list on construction and add to a Dictionary<T1, Lookup<T2, Thing>>. However, this feels messy, especially with the added parsing. And it gets really hairy if I need to lookup even more fields. Three fields would look like Dictionary<T1, Dictionary<T2, Lookup<T3, Thing>>>.

My next thought was to make a Lookup<Tuple<T1,T2,T3,...>,Thing>. But in this case, I am not sure whether the keys will actually work because Tuple is a reference type.

Even if I make a Lookup<ValueType<T1,T2,T3,...>,Thing> things, the lookup statement will be something like things[new ValueType<T1,T2,T3,...>(f1, f2, f3, ...)] which is pretty ugly (and I am still not sure whether I could trust those keys).

Is there a more elegant solution to this which keeps the performance benefits of a hashtable and where I could simply type something like IEnumerable<Thing> found = things[f1, f2, f3, ...];?

David S.
  • 5,965
  • 2
  • 40
  • 77
  • 1
    Have you considered using something like an SQLite in memory database? – CodingGorilla Aug 24 '12 at 16:56
  • Does `Thing` have an identification proerty (ID, PrimaryKey or whatever)? – Andre Calil Aug 24 '12 at 17:06
  • [C# Multi-key Generic Dictionary](http://www.codeproject.com/Articles/32894/C-Multi-key-Generic-Dictionary) – L.B Aug 24 '12 at 17:06
  • 1
    You can use `Tuple`s as dictionary keys since they are immutable. The rule for dictionary keys, as stated on the MSDN page (http://msdn.microsoft.com/en-us/library/xfhwa508.aspx) is that the value of the key cannot change as long as it is being used as a key. The implementation seems to use the hash code. Since a tuple isn't changing, and presumable produces the same hash code over time, it should be fine as a key. See also this: http://stackoverflow.com/questions/1483059/is-this-expected-c-sharp-4-0-tuple-equality-behavior – siride Aug 24 '12 at 17:07
  • Here's a related question with some good information: http://stackoverflow.com/questions/955982/tuples-or-arrays-as-dictionary-keys-in-c-sharp – JamieSee Aug 24 '12 at 22:44
  • What is the type? Or do you mean any value type? – paparazzo Oct 02 '12 at 00:52

5 Answers5

3

Lookup<Tuple<T1,T2,T3,...>,Thing> will work, since Tuple overrides Equals and GetHashCode.

To make the lookup syntax less ugly, you can use Tuple.Create which supports type inference. Your code becomes things[Tuple.Create(f1, f2, f3, ...)]. If that's still too ugly, it's trivial to add a helper method that takes the individual values as parameters.

I'd also consider creating my own immutable class(or value type) for the key, so you get clean field names instead of ItemX. You just need to override Equals and GetHashCode consistently.

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • Maybe I was on to something then. This sure seems like the simple, tidy solution. I am not sure what you meant by "so you get clean field names instead of `ItemX`" though? And how do I override `Equals` and `GetHashCode`? I don't know how those implementations should look like. – David S. Aug 27 '12 at 10:13
  • 1
    The `ItemX` occurs if you use the `Tuple` class. The idea is instead to create your own class that has better property names than `Item1`, `Item2`, etc. – Oliver Aug 27 '12 at 10:34
  • Tuple generates a lot hash collitions. It is not a good candidate for a key. – paparazzo Oct 02 '12 at 00:49
2

You can create multiple lookups, and then intersect them to do your searches. Here is a somewhat oversimplified example, but it should illustrate the idea:

class Test {
    public string A { get; set; }
    public string B { get; set; }
    public string C { get; set; }
}

var list = new List<Test> {
    new Test {A = "quick", B = "brown", C = "fox"}
,   new Test {A = "jumps", B = "over", C = "the"}
,   new Test {A = "lazy", B = "dog", C = "quick"}
,   new Test {A = "brown", B = "fox", C = "jumps"}
,   new Test {A = "over", B = "the", C = "lazy"}
,   new Test {A = "dog", B = "quick", C = "brown"}
,   new Test {A = "fox", B = "jumps", C = "over"}
,   new Test {A = "the", B = "lazy", C = "dog"}
,   new Test {A = "fox", B = "brown", C = "quick"}
,   new Test {A = "the", B = "over", C = "jumps"}
,   new Test {A = "quick", B = "dog", C = "lazy"}
,   new Test {A = "jums", B = "fox", C = "brown"}
,   new Test {A = "lazy", B = "the", C = "over"}
,   new Test {A = "brown", B = "quick", C = "dog"}
,   new Test {A = "over", B = "jumps", C = "fox"}
,   new Test {A = "dog", B = "lazy", C = "the"}
};
var byA = list.ToLookup(v => v.A);
var byB = list.ToLookup(v => v.B);
var byC = list.ToLookup(v => v.C);
var all = byA["quick"].Intersect(byB["dog"]);
foreach (var test in all) {
    Console.WriteLine("{0} {1} {2}", test.A, test.B, test.C);
}
all = byA["fox"].Intersect(byC["over"]);
foreach (var test in all) {
    Console.WriteLine("{0} {1} {2}", test.A, test.B, test.C);
}

This prints

quick dog lazy
fox jumps over
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • Can be fast if the rarest word you search for is rare enough. Can be slow if it isn't. – CodesInChaos Aug 24 '12 at 17:17
  • @CodesInChaos That is true, if the words are not distributed well, you wouldn't get much speed-up. Although I suppose you'd still beat the "full scan" method described at the top of the post. – Sergey Kalinichenko Aug 24 '12 at 17:21
  • A slight variant on this is using the lookup for the rarest word, and then filtering down with `Where` from there on. Might be slightly faster and takes less memory. – CodesInChaos Aug 24 '12 at 18:17
  • Interesting solution. I can not be sure of the speed consistency before I do a thorough test on variations of my dataset though. – David S. Aug 27 '12 at 10:11
1

Have you considered using a hash table with some kind of combination of the Fields as the key? I don't know enough about your data set to say if this is viable or not. Since the keys would need to be unique. But since you're not doing additions or removals using a hash table for look ups in memory is about as fast as you can get.

Justin
  • 4,002
  • 2
  • 19
  • 21
1

If i got you right, you can use Hashtable with Tuple, example below:

        // populate Hastable
        var hash = new Hashtable();            
        var tuple = Tuple.Create("string", 1, 1.0);
        hash.Add(tuple,tuple);

        // search for item you want
        var anotherTuple = Tuple.Create("string", 1, 1.0);
        // result will be tuple declared above
        var result = hash[anotherTuple];

more complex solution (if duplicate keys needed):

public class Thing
{
    public int Value1 { get; set; }

    public double Value2 { get; set; }

    public string Value3 { get; set; }

    // preferable to create own Equals and GetHashCode methods
    public Tuple<int, double>  GetKey()
    {
       // create key on fields you want 
       return Tuple.Create(Value1, Value2);
    }
}

usage

 var t1 = new Thing() {Value1 = 1, Value2 = 1.0, Value3 = "something"};
 var t2 = new Thing() {Value1 = 1, Value2 = 2.0, Value3 = "something"};
 var hash = new [] { t1, t2 }.ToLookup(item => item.GetKey());

 var criteria = new Thing() { Value1 = 1, Value2 = 2.0, value3 = "bla-bla-bla" };
 var r = hash[criteria.GetKey()]; // will give you t1
user854301
  • 5,383
  • 3
  • 28
  • 37
0

The Linq Where or Dictionary of Dictionaries is probably the prettiest you are going to get. But it may be more of a question of how you are organising your data.

E.G. This never going to be a pretty way of accessing people data:

people["FirstName"]["LastName"] 

It is usually better so try and come up with a simpler key.

Liazy
  • 301
  • 2
  • 11