1

I have a collection which is a permutation of two unique orders, where OrderId is unique. Thus it contains the Order1 (Id = 1) and Order2 (Id = 2) as both 12 and 21. Now while processing a routing algorithm, few conditions are checked and while a combination is included in the final result, then its reverse has to be ignored and needn't be considered for processing. Now since the Id is an integer, I have created a following logic:

 private static int GetPairKey(int firstOrderId, int secondOrderId)
        {
            var orderCombinationType = (firstOrderId < secondOrderId)
                ? new {max = secondOrderId, min = firstOrderId}
                : new { max = firstOrderId, min = secondOrderId };

            return (orderCombinationType.min.GetHashCode() ^ orderCombinationType.max.GetHashCode());
        }

In the logic, I create a Dictionary<int,int>, where key is created using the method GetPairKey shown above, where I ensure that out of given combination they are arranged correctly, so that I get the same Hashcode, which can be inserted and checked for an entry in a Dictionary, while its value is dummy and its ignored.

However above logic seems to have a flaw and it doesn't work as expected for all the logic processing, what am I doing wrong in this case, shall I try something different to create a Hashcode. Is something like following code a better choice, please suggest

Tuple.Create(minOrderId,maxOrderId).GetHashCode, following is relevant code usage:

  foreach (var pair in localSavingPairs)
            {
                    var firstOrder = pair.FirstOrder;
                    var secondOrder = pair.SecondOrder;

                   if (processedOrderDictionary.ContainsKey(GetPairKey(firstOrder.Id, secondOrder.Id))) continue;

Adding to the Dictionary, is the following code:

processedOrderDictionary.Add(GetPairKey(firstOrder.Id, secondOrder.Id), 0); here the value 0 is dummy and is not used

Mrinal Kamboj
  • 11,300
  • 5
  • 40
  • 74
  • If the hash code is an integer, and the two ids are integers, you can't create perfectly unique hashes, due to the [pigeonhole principle](https://en.wikipedia.org/wiki/Pigeonhole_principle). You have 2^32 possible hashes, and 2^63 tuples you have to assign them to. – Kevin Nov 23 '15 at 17:31
  • Can you show the code that uses this with a dictionary? From what you describe it might be far from optimal, but it should still work. – Jon Hanna Nov 23 '15 at 17:34
  • Do I have any option to make this work, to get the unique for the combination of Ids – Mrinal Kamboj Nov 23 '15 at 17:34
  • Since dictionaries handle collisions, there's no reason why you should need unique IDs. (Keeping collisions low would certainly be beneficial though). – Jon Hanna Nov 23 '15 at 17:37
  • @JonHanna updated the usage – Mrinal Kamboj Nov 23 '15 at 17:46
  • Oh. I see. yeah, don't do that. Answer to follow. – Jon Hanna Nov 23 '15 at 17:47
  • Are you using hashcodes with the assumption that you can create unique identifiers for a pair of values? That's not going to work out well for you because, fundamentally, hashcodes are not guaranteed to be unique. – spender Nov 23 '15 at 17:52
  • @spender there is such a thing as "perfect hashing" where they do, so in terms of the wider study of hash-codes, what you say is not correct, though it certainly is correct in terms of what is being attempted here. – Jon Hanna Nov 23 '15 at 17:57
  • @JonHanna Sure, but a perfect hash that covers two ints but fits into an int is impossible. – spender Nov 23 '15 at 17:58
  • @spender a perfect hash also isn't possible without prior knowledge of the range of values, so again, not applicable here. – Jon Hanna Nov 23 '15 at 19:49
  • 1
    @JonHanna A "perfect hash" *isn't a hash*. That's a paradoxical concept. The whole point of a hash is something that helps sort data into reasonably distinct buckets for any particular data set. Any data set with a possible range of values greater than the hash set's possible range of values is susceptible to collisions, any data set smaller than the hash set isn't really being hashed (or, more correctly, doesn't even need hashing). – ErikE Nov 23 '15 at 20:29
  • 1
    @ErikE they need hashing in precisely the sort of cases perfect hashes are used for; you can't index into an array or jump-table with `"orange"` or into one with 10 elements with both `12` and `1234212`. – Jon Hanna Nov 24 '15 at 10:01
  • 1
    @jon I guess I see your point, however I still maintain that "perfect hash" is by itself nonsense. A custom hash over a defined dataset that yields no collisions (because the hash space is larger than the defined set) is possible, but far from "perfect". – ErikE Nov 24 '15 at 17:01

3 Answers3

2

First, 42.GetHashCode() returns 42. Second, 1 ^ 2 is identical to 2 ^ 1, so there's really no point in sorting numbers. Third, your "hash" function is very weak and produces a lot of collisions, which is why you're observing the flaws.

There are two options I can think of right now:

  • Use a slightly "stronger" hash function
  • Replace your Dictionary<int, int> key with Dictionary<string, int> with keys being your two sorted numbers separated by whatever character you prever -- e.g. 56-6472
Community
  • 1
  • 1
Anton Gogolev
  • 113,561
  • 39
  • 200
  • 288
  • My understanding remains a Dictionary where I can provide a unique string, as suggested, using any acceptable special character like '-' or '%', would always generate a unique key for a given combination or do I still have a scope of dealing with the situation where there could be collision – Mrinal Kamboj Nov 24 '15 at 08:16
2

Given that XOR is commutative (so (a ^ b) will always be the same as (b ^ a)) it seems to me that your ordering is misguided... I'd just

(new {firstOrderId, secondOrderId}).GetHashCode()

.Net will fix you up a good well-distributed hashing implementation for anonymous types.

spender
  • 117,338
  • 33
  • 229
  • 351
  • Thanks let me try, but still I would need to sort the Ids correct, that is how I would get a unique Id for a given combination – Mrinal Kamboj Nov 23 '15 at 17:46
  • What do you mean by "sort the ids"? You're not guarantee to get a unique hash code, period. – Lasse V. Karlsen Nov 23 '15 at 17:48
  • @MrinalKamboj Ah... now we move on to a different issue altogether. Using a hashcode as a measure of uniqueness is a terrible idea. There is no guarantee that it will produce a unique id. That is not what hashcodes are for. If you want a guaranteed unique value, then I'd consider a `long` with `(value1<<32) + value2` – spender Nov 23 '15 at 17:48
  • I think you're missing what the OP is asking for. He wants to make sure that the set `{ 1, 2 }` produces the same hashcode as `{ 2, 1 }`. I do think he is a little unclear on hash collisions and needs help there, but I think the answer is to introduce him to custom IEqualityComparers. His method of ensuring the same hashcode is produced is ordering the two numbers so `{ 2, 1 }` is resolved to `{ 1, 2 }` before hashing. Thus, your answer seems to miss the mark as it completely ignores this aspect. – ErikE Nov 23 '15 at 17:51
2

You need a value that can uniquely represent every possible value.

That is different to a hash-code.

You could uniquely represent each value with a long or with a class or struct that contains all of the appropriate values. Since after a certain total size using long won't work any more, let's look at the other approach, which is more flexible and more extensible:

public class KeyPair : IEquatable<KeyPair>
{
  public int Min { get; private set; }
  public int Max { get; private set; }

  public KeyPair(int first, int second)
  {
    if (first < second)
    {
      Min = first;
      Max = second;
    }
    else
    {
      Min = second;
      Max = first;
    }
  }

  public bool Equals(KeyPair other)
  {
    return other != null && other.Min == Min && other.Max == Max;
  }

  public override bool Equals(object other)
  {
    return Equals(other as KeyPair);
  }

  public override int GetHashCode()
  {
    return unchecked(Max * 31 + Min);
  }
}

Now, the GetHashCode() here will not be unique, but the KeyPair itself will be. Ideally the hashcodes will be very different to each other to better distribute these objects, but doing much better than the above depends on information about the actual values that will be seen in practice.

The dictionary will use that to find the item, but it will also use Equals to pick between those where the hash code is the same.

(You can experiment with this by having a version for which GetHashCode() always just returns 0. It will have very poor performance because collisions hurt performance and this will always collide, but it will still work).

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
  • 1
    You might want to consider enclosing the hashcode calculation with `unchecked` – spender Nov 23 '15 at 18:06
  • This is the right answer. Just addressing how to compute a hashcode isn't going to solve the OP's issue. Your answer could be improved by showing him that the `KeyPair` itself is used as the key of the dictionary `Add` method. – ErikE Nov 23 '15 at 18:29
  • @ErikE And, amusingly enough, the dictionary itself will internally compute a hash-code for the pair in a way that would have been a good answer for the OP. – btilly Nov 23 '15 at 20:00
  • 1
    @btilly I think that's incorrect. Without making the `KeyPair` object implement `IEquatable`, it will use reference equality for object keys, so a `KeyPair` not implementing `IEquatable` would NOT throw in this scenario: `var dict = new Dictionary(); dict.Add(new KeyPair(1, 2), "first"); dict.Add(new KeyPair(2, 1), "second");` whereas this code DOES throw using this answer's `KeyPair : IEquatable`. – ErikE Nov 23 '15 at 20:24
  • If it was a struct it would have, but poorly. If it was an anonymous class it would do a better job. – Jon Hanna Nov 23 '15 at 21:22
  • @JonHanna thanks for the great explanation and a very nice option to resolve the issue – Mrinal Kamboj Nov 24 '15 at 08:17