1

I have a custom class that I was trying to use as a key for a dictionary:

// I tried setting more than enough capacity also...
var dict = new Dictionary<MyPoint, MyPoint>(capacity);

Now let me be clear, the goal here is to compare two SIMILAR but DIFFERENT lists, using X, Y, and Date as a composite key. The values will vary between these two lists, and I'm trying to quickly compare them and compute their differences.

Here is the class code:

public class MyPoint : IEquatable<MyPoint>
{
    public short X { get; set; }
    public short Y { get; set; }
    public DateTime Date { get; set; }
    public double MyValue { get; set; }

    public override bool Equals(object obj)
    {
        return base.Equals(obj as MyPoint);
    }

    public bool Equals(MyPoint other)
    {
        if (other == null)
        {
            return false;
        }

        return (Date == other.Date)
            && (X == other.X)
            && (Y == other.Y);
    }

    public override int GetHashCode()
    {
        return Date.GetHashCode()
             | X.GetHashCode()
             | Y.GetHashCode();
    }
}

I also tried keying with a struct:

public struct MyPointKey
{
    public short X;
    public short Y;
    public DateTime Date;
    // The value is not on these, because the struct is only used as key
}

In both cases dictionary writing was very, very slow (reading was quick).

I changed the key to a string, with the format:

var dict = new Dictionary<string, MyPoint>(capacity);
var key = string.Format("{0}_{1}", item.X, item.Y);

I was amazed at how much quicker this is -- it's at least 10 times faster. I tried Release mode, no debugger, and every scenario I could think of.

This dictionary will contain 350,000 or more items, so performance does matter.

Any thoughts or suggestions? Thanks!

Another edit...

I'm trying to compare two lists of things in the fastest way I can. This is what I'm working with. The Dictionary is important for fast lookups against the source list.

IList<MyThing> sourceList;
IDictionary<MyThing, MyThing> comparisonDict;

Parallel.ForEach(sourceList,
    sourceItem =>
    {
        double compareValue = 0;

        MyThing compareMatch = null;
        if (comparisonDict.TryGetValue(sourceItem, out compareMatch))
        {
            compareValue = compareMatch.MyValue;
        }

        // Do a delta check on the item
        double difference = sourceItem.MyValue- compareValue;
        if (Math.Abs(difference) > 1)
        {
            // Record the difference...
        }
    });
jocull
  • 20,008
  • 22
  • 105
  • 149
  • 1
    What's the reason for having a dictionary with two of the same type? How are the `key` and `value` related? Are they the same object? – ps2goat Jun 03 '15 at 14:52
  • 3
    You've answered your own question - using a complex type as a key to a Dictionary is slower than using a primitive type. – paul Jun 03 '15 at 14:53
  • 3
    Keep in mind - do not use mutable objects as dictionary keys. Also with immutable point you can calculate and store hash code during point initialization. Thus you will have speed of primitive type – Sergey Berezovskiy Jun 03 '15 at 14:54
  • Yes, they're the same object, but the reason is for keying and lookup time. Sorry, I should list the 4th propery that is used - I will edit. I tried using a `HashSet` originally, but couldn't find the best way to get the item out of it, only check for contains. – jocull Jun 03 '15 at 14:54
  • 2
    I guess you dont know the `HashSet`, do you? **Edit** Why do you want to get the item out of it? You already have it. – Tim Schmelter Jun 03 '15 at 14:54
  • @TimSchmelter Can you lookup an item by key in a HashSet? – jocull Jun 03 '15 at 14:55
  • I think you should use the `Date` field value as dictionary key – Binkan Salaryman Jun 03 '15 at 14:56
  • 1
    @jocull: what means _item_? If its the `MyPoint` you already have it. – Tim Schmelter Jun 03 '15 at 14:56
  • 5
    Also, `GetHashCode()` uses | which might not be the best method. – Dennis_E Jun 03 '15 at 14:57
  • The problem here is I am comparing two lists that have unique X, Y, and Date, but different values. I need to key each with those three but then see how different the value portion is (see edited question) – jocull Jun 03 '15 at 14:57
  • @Dennis_E Can you provide/link to a better hashing method? – jocull Jun 03 '15 at 14:57
  • 1
    @jocull search this site for 'GetHashCode implementation' – Sergey Berezovskiy Jun 03 '15 at 14:57
  • 2
    @Dennis_E: `|` is the binary OR operator – Binkan Salaryman Jun 03 '15 at 14:58
  • 3
    @BinkanSalaryman Usually you use `^` (XOR) for combining hash codes. – juharr Jun 03 '15 at 14:59
  • 4
    @BinkanSalaryman I know what it is. I was just saying that's not how you implement a hashcode. A popular method is multiplying each individual hashcode by a prime number and adding them up. – Dennis_E Jun 03 '15 at 15:00
  • If the key and value are both the same, you're better off creating a `List` and just searching the actual objects' properties. You could also reate your own hash function that contains the unique values you will use for looking up the data, e.g., ` return x.ToString() + "|" + y.ToString()`. – ps2goat Jun 03 '15 at 15:01
  • @BinkanSalaryman Similar to this: http://stackoverflow.com/a/720282/97964 – jocull Jun 03 '15 at 15:02
  • 1
    @ps2goat Using two arrays and iterating them both is much slower than a string key in a Dictionary for lookup. (just tested it). If both lists have 350k items, it's a lot of iterations (well over a billion?) – jocull Jun 03 '15 at 15:09
  • @jocull: i still dont know what the issue is. What must be quick, finding the value that belongs to a given key? Then use your dictionary. You said it yourself that it's quick. So what is slow? Also, you say you want to compare two lists. Are you creating the dictionary always by the way? Maybe you can replace your two lists completely. – Tim Schmelter Jun 03 '15 at 15:22
  • @TimSchmelter I added another edit with the details of the comparison. – jocull Jun 03 '15 at 15:37
  • @jocull, that's why I said you could create your own hashcode from the unique lookup values (if you still wanted a dictionary). The list option was an alternative to keying on an object. – ps2goat Jun 03 '15 at 15:45

2 Answers2

3

As others have said in the comments, the problem is in your GetHashCode() implementation. Taking your code, and running 10,000,000 iterations with the string key took 11-12 seconds. Running with your existing hashCode I stopped it after over a minute. Using the following hashCode implementation took under 5 seconds.

public override int GetHashCode()
{
    var hashCode = Date.GetHashCode();
    hashCode = (hashCode * 37) ^ X.GetHashCode();
    hashCode = (hashCode * 37) ^ Y.GetHashCode();
    return hashCode;
}

The problem is that when you get into large numbers, the items are all colliding in the same buckets, due to the ORs. A dictionary where everything is in the same bucket is just a list.

Kyle W
  • 3,702
  • 20
  • 32
  • 2
    Whoa, that made a ***massive*** difference. Thanks! The only edit I made was I wrapped the code inside in an `unchecked` block so the ints are allowed to roll over. Dictionary creation is instantaneous now! – jocull Jun 03 '15 at 15:42
0

If I got you right, you like to use a set while still maintaining the order of the keys. In this case, take SortedSet`1 instead.

Code:

class Program {
    static void Main(string[] args) {
        SortedSet<MyKey> list = new SortedSet<MyKey>() {
             new MyKey(0, 0, new DateTime(2015, 6, 4)),
            new MyKey(0, 1, new DateTime(2015, 6, 3)),
            new MyKey(1, 1, new DateTime(2015, 6, 3)),
            new MyKey(0, 0, new DateTime(2015, 6, 3)),
            new MyKey(1, 0, new DateTime(2015, 6, 3)),

        };
        foreach(var entry in list) {
            Console.WriteLine(string.Join(", ", entry.X, entry.Y, entry.Date));

        }
        Console.ReadKey();
    }
}

I changed your MyPoint class as follows:

public sealed class MyKey : IEquatable<MyKey>, IComparable<MyKey> {
    public readonly short X;
    public readonly short Y;
    public readonly DateTime Date;

    public MyKey(short x, short y, DateTime date) {
        this.X = x;
        this.Y = y;
        this.Date = date;
    }

    public override bool Equals(object that) {
        return this.Equals(that as MyKey);
    }

    public bool Equals(MyKey that) {
        if(that == null) {
            return false;
        }

        return this.Date == that.Date
            && this.X == that.X
            && this.Y == that.Y;
    }

    public static bool operator ==(MyKey lhs, MyKey rhs) {
        return lhs != null ? lhs.Equals(rhs) : rhs == null;
    }

    public static bool operator !=(MyKey lhs, MyKey rhs) {
        return lhs != null ? !lhs.Equals(rhs) : rhs != null;
    }

    public override int GetHashCode() {
        int result;
        unchecked {
            result = (int)X;
            result = 31 * result + (int)Y;
            result = 31 * result + Date.GetHashCode();
        }
        return result;
    }

    public int CompareTo(MyKey that) {
        int result = this.X.CompareTo(that.X);
        if(result != 0) {
            return result;
        }
        result = this.Y.CompareTo(that.Y);
        if(result != 0) {
            return result;
        }
        result = this.Date.CompareTo(that.Date);
        return result;
    }
}

Output:

0, 0, 03.06.2015 00:00:00
0, 0, 04.06.2015 00:00:00
0, 1, 03.06.2015 00:00:00
1, 0, 03.06.2015 00:00:00
1, 1, 03.06.2015 00:00:00
Binkan Salaryman
  • 3,008
  • 1
  • 17
  • 29
  • Thanks for the post, but I'm not sure how using a `SortedSet` helps the lookup times. Can you show an example? – jocull Jun 03 '15 at 15:46