11

I have a big dictionary where the key is decimal, but the GetHashCode() of System.Decimal is disasterously bad. To prove my guess, I ran a for loop with 100.000 neigboring decimals and checked the distribution. 100.000 different decimal numbers used only 2 (two!!!) different hashcodes.

Decimal is represented as 16 bytes. Just like Guid! But the GetHashCode() distribution of Guid is pretty good. How can I convert a decimal to Guid in C# as cheap as possible? Unsafe code is OK!


EDIT: The test was requested, so here is the code:

decimal d = 96000000000000000000m;
Dictionary<int, int> hashcount = new Dictionary<int, int>();
int length = 100000;
for (int i = 0; i < length; i++)
{
    int hashcode = d.GetHashCode();
    int n;
    if (hashcount.TryGetValue(hashcode, out n))
    {
        hashcount[hashcode] = n + 1;
    }
    else
    {
        hashcount.Add(hashcode, 1);
    }
    d++;
}

Console.WriteLine(hashcount.Count);

This prints 7. I do not remember the starting decimal that gave me 2.

Hooked
  • 84,485
  • 43
  • 192
  • 261
user256890
  • 3,396
  • 5
  • 28
  • 45
  • 2
    Can you elaborate on the 100.000 decimals you tested? I just tried a small test of 1.000 neighbouring decimals and got 1.000 different hash codes. – Martin Liversage Aug 25 '10 at 07:55
  • 1
    `Enumerable.Range(0, 100000).Select(i => 1000000M + i/100000000000000M).Select(d => d.GetHashCode()).Distinct().Count()` returns 10 which demonstrates that the distribution is bad when the significands are very close. – Martin Liversage Aug 25 '10 at 08:19
  • @Martin: See this Eric Lipper's blog entry http://blogs.msdn.com/b/ericlippert/archive/2010/03/22/socks-birthdays-and-hash-collisions.aspx – abatishchev Aug 25 '10 at 08:22
  • @abatishchev: Eric's blog doesn't explain why it's *quite* so bad as that. – Jon Skeet Aug 25 '10 at 08:54
  • @Martin: Wow - it's even worse in .NET 4... I only get *one* value then! – Jon Skeet Aug 25 '10 at 08:56
  • I've just tried my answer with the same numbers - I got 100,000 distinct values :) – Jon Skeet Aug 25 '10 at 08:56
  • Decimal.GetHashCode() is computed by converting the value to double. That produces 8 bytes, the upper and lower set of 4 are xor-ed. You are having a problem because double can only store 15 significant digits. – Hans Passant Aug 27 '10 at 12:40

4 Answers4

25

EXTREMELY HACKY SOLUTION (but probably fastest possible)

public static class Utils
{
    [StructLayout(LayoutKind.Explicit)]
    struct DecimalGuidConverter
    {
        [FieldOffset(0)]
        public decimal Decimal;
        [FieldOffset(0)]
        public Guid Guid;
    }

    private static DecimalGuidConverter _converter;
    public static Guid DecimalToGuid(decimal dec)
    {
        _converter.Decimal = dec;
        return _converter.Guid;
    }
    public static decimal GuidToDecimal(Guid guid)
    {
        _converter.Guid = guid;
        return _converter.Decimal;
    }
}

// Prints 000e0000-0000-0000-8324-6ae7b91d0100
Console.WriteLine(Utils.DecimalToGuid((decimal) Math.PI));

// Prints 00000000-0000-0000-1821-000000000000
Console.WriteLine(Utils.DecimalToGuid(8472m));

// Prints 8472
Console.WriteLine(Utils.GuidToDecimal(Guid.Parse("00000000-0000-0000-1821-000000000000")));
Timwi
  • 65,159
  • 33
  • 165
  • 230
  • Hmmmmmm, I always thought overlapping fields to only apply when crossing native boundaries. Wow, it works! Learnt something new today :) +1 – leppie Aug 25 '10 at 07:56
  • WAAAAAOOOOO, this is genius! Studying the Guid.GetHashCode() in Reflector, hashvalue depend primarily on the first 8 bytes of the Guid. While in the decimal, the first 8 bytes varry the least. Any idea, how to reverse the byte order, or swap the first and the last 8 bytes? – user256890 Aug 25 '10 at 08:02
  • 4
    @user256890: Since you want it as fast as possible, just use the same trick again: http://csharp.pastebin.com/Jqg9F9HA – Timwi Aug 25 '10 at 08:11
  • 1
    It would need to be profiled, but I rather suspect that this is faster: for `DecimalToGuid`: `return *((Guid*)((void*)&dec));` and for `GuidToDecimal`: `return *((decimal*)((void*)&guid));` - you need to add an `unsafe`, but a union struct is *already* inherently unsafe - so no change introduced there. – Marc Gravell Jun 14 '13 at 09:10
  • 1
    I think the above may work when using `DecimalToGuid(decimal dec)` but not `GuidToDecimal(Guid guid)`. Here's an example of a guid that returns an invalid decimal `"{dc78377c-89e6-43a9-8155-bd6e395d041a}"` – Laith Jan 09 '17 at 04:33
  • Warning: Type of "Decimal" cannot represent the full of 2^128 different values due to its special bit useage rules. The method in this answer will likely cause irreversible conversions and data broken. – Mr. Squirrel.Downy May 24 '22 at 02:06
5

If you're just trying to get a different hash algorithm, there's no need to convert to a Guid. Something like this:

public int GetDecimalHashCode(decimal value)
{
    int[] bits = decimal.GetBits(value);
    int hash = 17;
    foreach (int x in bits)
    {
        hash = hash * 31 + x;
    }
    return hash;
}

(Obviously substitute a different algorithm if you want.)

Admittedly this still involves creating an array, which isn't ideal. If you really want to create a Guid you could use the code above to get the bits and then a long Guid constructor passing in appropriate values from the array.

I'm somewhat suspicious of the decimal hashcode being so bad though. Do you have some sample code for that?

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 2
    How would you use this when storing `decimal` values in a dictionary? Surely the dictionary will invariably use the original `decimal::GetHashCode()`? – Timwi Aug 25 '10 at 08:20
  • 3
    Just pass your custom IEqualityComparer in dictionary constructor – digEmAll Aug 25 '10 at 08:25
  • @Timwi But the hashcode **doesn't have to be unique** within a dictionary, so that wouldn't matter so much. – Rowland Shaw Aug 25 '10 at 08:30
  • @Rowland: sure, but dictionary performances decrease with bad hashing (the famous ~O(1) key-lookup time will only work with very good hashing) – digEmAll Aug 25 '10 at 08:33
  • @Rowland: I don’t see how your comment relates to mine at all. – Timwi Aug 25 '10 at 09:02
  • @Timwi I was responding to the fact that the dictionary would refer to decimal's `GetHashCode()` - whilst not brilliant (if the OP's sample data is distributed in a similar manner to their test data), it would still *work*, so it maybe a case of worrying unduely which implementation gets called. – Rowland Shaw Aug 25 '10 at 11:01
  • @Timwi: The constructor of `Dictionary` can accept an `IEqualityComparer`; one could thus implement a `DecimalComparer` class which would behave as indicated here. Note that if one computes hash codes based upon binary representation, one must compare equality likewise and be aware of what it "means"; notably, `1.0m` and `1.00m` will compare as distinct. – supercat Mar 06 '14 at 21:05
0

Convert your decimal value to byte array, and then create a guid from it:

public static byte[] DecimalToByteArray (decimal src) 
{
    using (MemoryStream stream = new MemoryStream()) 
    {
        using (BinaryWriter writer = new BinaryWriter(stream))
        {
            writer.Write(src);
            return stream.ToArray();
        }
    }
}

Decimal myDecimal = 1234.5678M;
Guid guid = new Guid(DecimalToByteArray(myDecimal));
Nissim
  • 6,395
  • 5
  • 49
  • 74
  • 1
    While I bet it works, it does not look fast. This code is to be used in a dictionary! – user256890 Aug 25 '10 at 07:53
  • 2
    Dude, if you want it to work fast - say it! don't immediately give a down vote... people are here to help you. My solution answer you question - nowhere in your question did you say it HAVE to be fast. – Nissim Aug 25 '10 at 07:57
  • By the way - I tested it on 100000 values and it took 00:00:00.0754544 seconds (75 ticks) - believe me - that's fast enough – Nissim Aug 25 '10 at 08:02
  • I did say "cheap". And I did not vote it down. It was someone else. – user256890 Aug 25 '10 at 08:04
  • @user: while Nissim's answer certainly wouldn't be fast, i doubt that *anything* will be very fast for this. Are the keys being generated each time you create the dictionary, or are they static bits of data (ie keys from some data source)? – slugster Aug 25 '10 at 08:19
0

The distribution of GUID is good as it is meant to be unique...

What is the range of numbers used for this? The default GetHashcode() implementation for Decimal might only take a certain range of values into consideration.

leppie
  • 115,091
  • 17
  • 196
  • 297