0

I am having class below

class Group
{
    public Collection<int> UserIds { get; set; }
    public int CreateByUserId { get; set; }
    public int HashKey { get; set; }
}

I want to generate some unique hashkey based on UsersIds[] and CreateByUserId and store it to mongo and search on it.

Conditions:

  1. each time the hashkey should me same for same UsersIds[] and CreateByUserId
  2. hashkey should be different when number of users increases in UsersIds[]

In a soultion for this I am overriding GetHashCode() function:

public override int GetHashCode()
{
    unchecked
    {
        var hash = (int)2166136261;
        const int fnvPrime = 16777619;

        List<int> users = new List<int>() { CreateByUserId };
        UserIds.ToList().ForEach(x => users.Add(x));
        users.Sort();

        users.ForEach(x => hash = (hash * fnvPrime) ^ x.GetHashCode());
        return hash;
    }
}

Is it a better solution or suggest some better solution.

Rand Random
  • 7,300
  • 10
  • 40
  • 88
Ankit Vaidya
  • 35
  • 1
  • 8
  • I don't think you need the ToLost – paparazzo May 07 '18 at 08:31
  • 5
    Read [MSDN](https://msdn.microsoft.com/en-us/library/system.object.gethashcode(v=vs.110).aspx): `Do not use the hash code as the key to retrieve an object from a keyed collection.` – dymanoid May 07 '18 at 08:39
  • 5
    hash-codes aren't required to be the same between runs - and often aren't (for example, string now gives different hashes per run by default); you should never use hash-codes with external systems – Marc Gravell May 07 '18 at 08:42
  • By the pigeonhold principle you cannot create a unique int for every combination of more than one int. To see this consider that an int holds N values. If you have two ints (say two user IDs) then there are N^2 possible combinations of them but you only have N differnet ints to represent it. – Chris May 07 '18 at 11:26

3 Answers3

1

So if the intention is to save the hash value in the database dont override GetHashCode on the object, that is for use with HashTables (Dictionary, HashSet..) in conjunction with Equals and not unique enough for your purpose. Instead use an established hash function such as SHA1 for example.

public string Hash(IEnumerable<int> values)
{
   using (var hasher = new SHA1Managed())
   {
    var hash = hasher.ComputeHash(Encoding.UTF8.GetBytes(string.Join("-", values)));
    return BitConverter.ToString(hash).Replace("-", "");
   }
}

Usage:

var hashKey = Hash(UsersIds.Concat(new[]{ CreateByUserId });

Sort UsersIds if so desired.

Magnus
  • 45,362
  • 8
  • 80
  • 118
0

A HashKey is a value calculated to check if a call of Equals() may yield a result that's true. The hashkey is used to make a fast desicion if the element may be the right one or if it's for sure the false one.

First thing is, replace the wording HashKey with Unique Id.

If you want a unique Id, I'd recommend using the database with a Id column if you store it there anyway and then fetch the Id with the other data. + In mongo DB, each entry also already has a own Id: See here

Each object in mongo already has an id, and they are sortable in insertion order. What is wrong with getting collection of user objects, iterating over it and use this as incremented ID?[...]

That way: Use the DB for the unique ID and calculate your HashKey (if you need it anymore) with simple cheap math like adding up the user Ids.

To make it programatically: If you want to check it programatically and we ignore Ids from the DB, you need to implement the GetHashKey()-Function and the Equals()-Function of the given objects.

class Group
{
    public Collection<int> UserIds { get; set; }
    public int CreateByUserId { get; set; }

    public override bool Equals(object obj)
    {
        Group objectToCompare = (Group)obj;

        if (this.UserIds.Count != objectToCompare.UserIds.Count)
            return false;

        if (this.CreateByUserId != objectToCompare.CreateByUserId)
            return false;

        foreach (int ownUserId in this.UserIds)
            if (!objectToCompare.UserIds.Contains(ownUserId))
                return false;
        //some elements might be double, i.e. 1: 1,2,2 vs 2: 1,2,3 => not equal. cross check to avoid this error
        foreach (int foreignUserId in objectToCompare.UserIds)
            if (!this.UserIds.Contains(foreignUserId))
                return false;

        return true;
    }

    public override int GetHashCode()
    {
        int sum = CreateByUserId;
        foreach (int userId in UserIds)
            sum += userId;

        return sum;
    }
}

Usage:

Group group1 = new Group() { UserIds = ..., CreateByUserId = ...};
Group group2 = new Group() { UserIds = ..., CreateByUserId = ...};
group1.Equals(group2);

Here is the answer to "Why do we need the GetHashCode-Function when we use Equals?"

Note: This is for sure not the most performant solution for the Equals()-Method here. Adjust as needed.

Chrᴉz remembers Monica
  • 1,829
  • 1
  • 10
  • 24
  • I want to calculate unique key and on the basis of this, I want to add new entry if the same no. of users and createbyuserid is not exist. Mongo _id doesn't help in this solution. Suggest some other package to calculate unique key – Ankit Vaidya May 07 '18 at 08:50
  • @AnkitVaidya Creating unique keys on your own without a sequence is very hard.Easiest way is to check vs DB if it already exists. If you really want to do it programatically, you need to do make various assumptions. I.E. if all userIds are max. 4 digits(<10000), you can simple add 10^0xfirst Id + 10^4x2nd Id + 10^8x3rdId -> 1, 508, 3242 => 3242|5080|0001. This leads to big numbers for lists with much UserIds. I'll expand my answer for further details. – Chrᴉz remembers Monica May 07 '18 at 08:58
  • @AnkitVaidya a hash code is the *opposite* of a unique index. It's *guaranteed* to be non-unique by its definition. Why do you need *another* unique ID anyway, when MongoDB already adds one to each record? Explain what your *real* problem is, not how you think it can be solved. Why do you need to generate the key on the *client* ? Are you trying to add related items? Or store the key in an external system *before* actually adding the record? Why not retrieve the keys after insertion? – Panagiotis Kanavos May 07 '18 at 09:17
  • @ Panagiotis Kanavos I want to add new entry in mongo on basis of userIds and createByUserId. For this I have to search first if the same entry is present having same usersId and createByUserId and if it is present then I return same _id otherwise I am adding new record and sends its _id. For this mongo _id don't come into picture because users inputs usersId and createByUserId and on the basis on this I return new _id or existing one. – Ankit Vaidya May 07 '18 at 09:24
  • Also there are mistakes in Equals – SENya May 07 '18 at 09:31
  • @SENya 1. This answer should provide a way to a solution, the exact implementation should be done by the OP regarding his exact environment. I hope there are better solutions, but `prop1.GetHashCode*17+Prop2.GetHashCode()` makes it clearer. 2. Please correct my mistakes or tell them so I can fix them myself. – Chrᴉz remembers Monica May 07 '18 at 09:37
  • @SENya `Collection` has no own implementation of `GetHashCode()`(uses `object.GetHashCode()`), and the implementation of `GetHashCode()` for Int32 [returns the Int32 itself](https://stackoverflow.com/questions/3893782/how-is-gethashcode-implemented-for-int32). So there may be other ways to implement it, but the simple approach, as posted in your comment, doesnt work here. – Chrᴉz remembers Monica May 07 '18 at 09:47
  • @zuq first of all you will get unwanted overflow error in `foreach (int userId in UserIds) sum += userId;` It should be at least: `foreach (int userId in UserIds) { unchecked { sum += userId; } } ` – SENya May 07 '18 at 09:54
  • @zuq second - your function is bad, it easily leads to collisions. For example, take to objects with same `CreateByUserID` and following UserIDs { 1, 4} and { 2, 3 } It will produce same hash I apologise for the link, I intended to give this link with John Skeet's answer https://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-an-overridden-system-object-gethashcode – SENya May 07 '18 at 09:57
  • @zᴉɹɥƆ In the Equals method there are no checks for type of obj. You'll get NullReferenceException if you try to call it on some other type. Also you only check for set equality. Author didn't mention the ytpe of equality he needed. Should we compare collections with preserving order? Or just check if there are same elements in them – SENya May 07 '18 at 10:06
0

In general, without some extra information about the data, you cannot create unique integer from a whole bunch of other integers. You cannot create unique int key even from a single long value if there are no constraints on its range of allowed values.

The GetHashCode function does not guarantee that you get unique integer hash key for every possible Group object. However, the good hash function tries to minimize collisions - cases when the same hashcode is generated for different objects. There are good examples of hash functions in this SO answer: What is the best algorithm for an overridden System.Object.GetHashCode?

Usually you need GetHashCode to store object as a key in dictionaries and hashsets. Like the previous answer said you need to override Equals method for that case because hashtables like dictionaries and hashsets resolve the collision by storing items with the same hashcode in lists called buckets. They use Equals method to identify the item in the bucket. It is recommended practice to override Equals when you are overriding GetHashCode just as a precaution.

It was not specified what type of equality should you expect from 'Group' objects. Imagine two objects with the same CreateByUserID and the following UserIds: {1, 2} and {2, 1}. Are they Equal? Or the order matters?

It's not a good idea to allow changes for Group fields from any place. I would implemented it with read only fields like this:

class Group : IEquatable<Group>
{
    private readonly Collection<int> userIds;

    public ReadOnlyCollection<int> UserIds { get; }
    public int CreateByUserId { get; }
    public int HashKey { get; }

    public Group(int createByUserId, IList<int> createdByUserIDs)
    {
        CreateByUserId = createByUserId;
        userIds = createdByUserIDs != null 
           ? new Collection<int>(createdByUserIDs)
           : new Collection<int>();
        UserIds = new ReadOnlyCollection<int>(userIds);

        HashKey = GetHashCode();
    }

    public void AddUserID(int userID)
    {
        userIds.Add(userID);
        HashKey = GetHashCode();
    }

    //IEquatable<T> implementation is generally a good practice in such cases, especially for value types
    public override bool Equals(object obj) => Equals(obj as Group);

     public bool Equals(Group objectToCompare)
     {
        if (objectToCompare == null)
            return false;

        if (ReferenceEquals(this, objectToCompare))
            return true;

        if (UserIds.Count != objectToCompare.UserIds.Count || CreateByUserId != objectToCompare.CreateByUserId)
            return false;

        //If you need equality when order matters - use this
        //return UserIds.SequenceEqual(objectToCompare.UserIds);


        //This is for set equality. If this is your case and you don't allow duplicates then I would suggest to use HashSet<int> or ISet<int> instead of Collection<int>
        //and use their methods for more concise and effective comparison
        return UserIds.All(id => objectToCompare.UserIds.Contains(id)) && objectToCompare.UserIds.All(id => UserIds.Contains(id));
    }

    public override int GetHashCode()
    {
        unchecked // to suppress overflow exceptions
        {
            int hash = 17;          
            hash = hash * 23 + CreateByUserId.GetHashCode();

            foreach (int userId in UserIds)
                hash = hash * 23 + userId.GetHashCode();

            return hash;
        }
    }
}
SENya
  • 1,083
  • 11
  • 26