69

What does GetHashCode() calculate when invoked on the byte[] array? The 2 data arrays with equal content do not provide the same hash.

Chesnokov Yuriy
  • 1,760
  • 5
  • 21
  • 35
  • 15
    FYI: If you are on .NET4 ((IStructuralEquatable) myArray).GetHashCode(EqualityComparer.Default) should give the same result for two arrays with same content. http://msdn.microsoft.com/en-us/library/system.collections.istructuralequatable.aspx. – Just another metaprogrammer Aug 30 '11 at 15:25
  • Related: [why-does-.NET-not-implement-gethashcode-for-collections](http://stackoverflow.com/questions/2907372/why-does-c-sharp-not-implement-gethashcode-for-collections) – nawfal Aug 09 '14 at 11:32
  • 1
    @FuleSnabel Note that `IStructuralEquatable` requires use of the non-generic `IEqualityComparer` interface, which means that each `byte` in the array is going to be boxed during computation of the hashcode. – cdhowie Nov 13 '15 at 17:32

6 Answers6

77

Arrays in .NET don't override Equals or GetHashCode, so the value you'll get is basically based on reference equality (i.e. the default implementation in Object) - for value equality you'll need to roll your own code (or find some from a third party). You may want to implement IEqualityComparer<byte[]> if you're trying to use byte arrays as keys in a dictionary etc.

EDIT: Here's a reusable array equality comparer which should be fine so long as the array element handles equality appropriately. Note that you mustn't mutate the array after using it as a key in a dictionary, otherwise you won't be able to find it again - even with the same reference.

using System;
using System.Collections.Generic;

public sealed class ArrayEqualityComparer<T> : IEqualityComparer<T[]>
{
    // You could make this a per-instance field with a constructor parameter
    private static readonly EqualityComparer<T> elementComparer
        = EqualityComparer<T>.Default;

    public bool Equals(T[] first, T[] second)
    {
        if (first == second)
        {
            return true;
        }
        if (first == null || second == null)
        {
            return false;
        }
        if (first.Length != second.Length)
        {
            return false;
        }
        for (int i = 0; i < first.Length; i++)
        {
            if (!elementComparer.Equals(first[i], second[i]))
            {
                return false;
            }
        }
        return true;
    }

    public int GetHashCode(T[] array)
    {
        unchecked
        {
            if (array == null)
            {
                return 0;
            }
            int hash = 17;
            foreach (T element in array)
            {
                hash = hash * 31 + elementComparer.GetHashCode(element);
            }
            return hash;
        }
    }
}

class Test
{
    static void Main()
    {
        byte[] x = { 1, 2, 3 };
        byte[] y = { 1, 2, 3 };
        byte[] z = { 4, 5, 6 };

        var comparer = new ArrayEqualityComparer<byte>();

        Console.WriteLine(comparer.GetHashCode(x));
        Console.WriteLine(comparer.GetHashCode(y));
        Console.WriteLine(comparer.GetHashCode(z));
        Console.WriteLine(comparer.Equals(x, y));
        Console.WriteLine(comparer.Equals(x, z));
    }
}
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • @Chesnokov Yuriy: Okay, I've edited some code into my answer. – Jon Skeet Aug 30 '11 at 20:11
  • 5
    There seems to be [some debate](http://stackoverflow.com/a/468084/1149773) on whether `GetHashCode` should scan over the entire sequence. Interestingly, the [internal implementation](http://referencesource.microsoft.com/#mscorlib/system/array.cs,807) for `Array.IStructuralEquatable.GetHashCode` only considers the last eight items of an array, sacrificing hash uniqueness for speed. – Douglas Feb 13 '16 at 17:26
  • 1
    I did something similar using Enumerable.SequenceEqual(). Is there a particular reason to hand-code the element comparison? (Admittedly it's probably a bit faster.) – Peter - Reinstate Monica Feb 15 '17 at 19:15
  • @PeterA.Schneider: I don't *think* `SequenceEqual` is optimized to compare lengths first if the source implements appropriate interfaces. – Jon Skeet Feb 15 '17 at 20:00
  • @JonSkeet Since we have new primitives like `Memory`, `Span` or `Sequence` can this code be optimised in any way? For example we do have `SequenceEqual` for `ReadOnlySpan` now. – bitbonk Oct 01 '20 at 09:42
  • @bitbonk: I don't know whether that would be any faster - maybe; you'd need to run actual benchmarks for it. (It's possible that SequenceEqual is optimized to compare 8 bytes at a time, for example.) – Jon Skeet Oct 01 '20 at 09:52
23

Like other non-primitive built-in types, it just returns something arbitrary. It definitely doesn't try to hash the contents of the array. See this answer.

Community
  • 1
  • 1
mqp
  • 70,359
  • 14
  • 95
  • 123
14

Simple solution

    public static int GetHashFromBytes(byte[] bytes)
    {
        return new BigInteger(bytes).GetHashCode();
    }
Daniil Sokolyuk
  • 484
  • 1
  • 6
  • 10
  • 4
    Seeing this solution made me smile. Clean, elegant. Digging deeper the hash implementation ends up calling https://github.com/microsoft/referencesource/blob/master/System.Numerics/System/Numerics/NumericsHelpers.cs#L272 – Guy Langston Apr 17 '20 at 08:22
  • cough cough GetHashCode(); returns int32. – Xeorge Xeorge Apr 26 '20 at 18:43
  • 1
    @XeorgeXeorge so? – Dave Jellison Dec 11 '20 at 07:45
  • 1
    @DaveJellison There is a `(2^32)` in 1 chance of collision, which is negalegible for most scenarios but is something that must be kept in mind whenever there's a hash code. – fjch1997 Dec 26 '20 at 03:14
  • 2
    Agreed, but this is inherent with hashing as a rule. It's like going to the dictionary.com to complain about the definition of a word. – Dave Jellison Jan 02 '21 at 13:49
  • 2
    Note this method incurs a copy of the whole byte array, so may not be efficient. Also It's important to understand the purpose of GetHashCode() - it's not intended to produce a unique value but rather a well-distributed value for allocating buckets in a Dictionary or HashSet, which benefit from each bucket being roughly equal size. Both types use a combination of GetHashCode() and Equals() to determine whether a collision has really occurred. – Steve Pick Sep 03 '21 at 11:21
13

byte[] inherits GetHashCode() from object, it doesn't override it. So what you get is basically object's implementation.

rickythefox
  • 6,601
  • 6
  • 40
  • 62
4

If you are using .NET 6 or at least .NET Core 2.1, you can write less codes and achieve better performance with System.HashCode struct.

Using the method HashCode.AddBytes() which available from .NET 6:

public int GetHashCode(byte[] value)
{
    var hash = new HashCode();
    hash.AddBytes(value);
    return hash.ToHashCode();
}

Using the method HashCode.Add which available from .NET Core 2.1:

public int GetHashCode(byte[] value) =>
    value.Aggregate(new HashCode(), (hash, i) => {
        hash.Add(i);
        return hash;
    }).ToHashCode();

Note in the document of HashCode.AddBytes() it says:

This method does not guarantee that the result of adding a span of bytes will match the result of adding the same bytes individually.

In this sharplab demo they are just output same result, but this might be varying from .NET version or runtime environment.

n0099
  • 520
  • 1
  • 4
  • 12
1

If it's not the same instance, it will return different hashes. I'm guessing it is based on the memory address where it is stored somehow.

jishi
  • 24,126
  • 6
  • 49
  • 75