0

I need to a way to consistently generate a unique ID given an object or a subset of properties from the object across .NET frameworks / environments / executions with minimal collisions.

  • Using the same properties over and over again should always generate the same result
  • Collisions should be minimal
  • The properties are a mix of strings, ints, and decimals
  • The values of the properties are not sensitive data
  • Properties won't be added or removed
  • The application that uses this is only internal (not open to the public)
  • I don't want to rely on GetHashCode which could vary across .NET frameworks / environments / executions as described in this great article at andrewlock.net
    • He suggests a possible method but doesn't provide much information on real world use

For example, given a fixed object like

public class ExampleObject
{
    public string Prop1 { get; set; }
    public string Prop2 { get; set; }
    public int Prop3 { get; set; }
    public decimal Prop4 { get; set; }
}

I was thinking about converting all the values to a string, concatenating them, and then using some deterministic algorithm to get the desired result

// Convert all values to strings and concat
string str1 = "a";
string str2 = "b";
int int1 = 1;
decimal dec1 = 1.2m;
string concatStr = $"{str1}_{str2}_{int1.ToString()}_{dec1.ToString()}";

// Use the string to generate the desired value

I have seen people generate a MD5 hash from the string or but I am not sure about the real world usage for millions of possible combinations of the strings / ints / decimals. The decimal to string also worries me. The value shouldn't change, but could the .ToString() handle the precision differently in different environments?

penguin egg
  • 1,154
  • 2
  • 10
  • 20
  • 2
    That's not a GUID/UUID. Do a cryptographic hash and then truncate the result to the number of bits you need. SHA-1 is probably good enough (it should be avoided for digital signatures these days, but that's not what you are doing). That's what GIT uses. Stringify whatever you want, convert the resulting string to a byte array using an encoding (say, UTF-8) and then use SHA-1 from System.Security.Cryptography – Flydog57 Apr 08 '21 at 22:50
  • @madreflection Good point! Didn't even think of that. Do you mean the commit sha1 hashes? Quick search and it seems similar, they just concat strings together and generate it: https://gist.github.com/masak/2415865 – penguin egg Apr 08 '21 at 22:50
  • You may want to include the class name in the stringified description (and maybe even property names). You could even do a JSON serialization to get property names and values – Flydog57 Apr 08 '21 at 22:53
  • @Flydog57 Thanks for the suggestions. Is including the property names + values or just serializing the whole object as JSON a way to also include the definition of the object, in the case it changes? – penguin egg Apr 08 '21 at 22:55

1 Answers1

4

This will give you a complete signature of your object, the class name, the property names and the property values. I do a JSON serialization, I prepend the class name, then I hash the whole mess.

First, here's where I compute the hash:

public static byte[] GetObjectHash<T>(T obj)
{
    var serialized = JsonConvert.SerializeObject(obj);
    var stringified = typeof(T).Name + serialized;
    var bytes = Encoding.UTF8.GetBytes(stringified);
    using (var sha1 = SHA1.Create()) {
        var result = sha1.ComputeHash(bytes);
        return result;
    }
}

I have a helper function to convert the hash to a string (there are better implementations of this, search this site to get better implementations (like: How do you convert a byte array to a hexadecimal string, and vice versa?)).

public static string BytesToHex(byte[] bytes)
{
    var buffer = new StringBuilder(bytes.Length * 2);
    foreach (var byt in bytes)
    {
        buffer.Append(byt.ToString("X2"));
    }

    return buffer.ToString();
}

Then I test it (using your class (with a name I like better)):

var obj1 = new HashExampleClass
{
    Prop1 = "hello",
    Prop2 = "world",
    Prop3 = 42,
    Prop4 = 123.45m,
};

var obj2 = new HashExampleClass
{
    Prop1 = "hello",
    Prop2 = "world",
    Prop3 = 42,
    Prop4 = 123.45m,
};

var obj3 = new HashExampleClass
{
    Prop1 = "hello",
    Prop2 = "world",
    Prop3 = 41,
    Prop4 = 123.45m,
};

var hash1 = GetObjectHash(obj1);
var hash2 = GetObjectHash(obj2);
var hash3 = GetObjectHash(obj3);

Debug.WriteLine(BytesToHex(hash1));
Debug.WriteLine(BytesToHex(hash2));
Debug.WriteLine(BytesToHex(hash3));

The output from this looks like:

7190AA1C6ED17E3F302AB61BB5C639320FBA0C00
7190AA1C6ED17E3F302AB61BB5C639320FBA0C00
0B73706DAF7AC39AB0E00CC781AE0FA238A0FC2C

Note that the first two hashes are for different objects, but which have the exact same property values (so their hashes match). The last one has one different property value.

If you put a breakpoint in the main function, you will see that the stringified variable ends up as a string that looks like:

HashExampleClass{"Prop1":"hello","Prop2":"world","Prop3":42,"Prop4":123.45}

(if you look in the debugger, you will see lots of "\"" escaped quotation marks, but the string is really what I show)

Flydog57
  • 6,851
  • 2
  • 17
  • 18