47

I need to use a byte[] as a key in a Dictionary. Since byte[] doesn't override the default GetHashCode method, two separate byte[] objects that contain the same data will use two separate slots in the dictionary. Basically what I want is this:

Dictionary<byte[], string> dict = new Dictionary<byte[], string>();
dict[new byte[] {1,2,3}] = "my string";
string str = dict[new byte[] {1,2,3}];
// I'd like str to be set to "my string" at this point

Is there a simple way to do this? The only thing I can think of would be to build a wrapper class that just contains a byte[] and override GetHashCode based on the contents of the byte[], but this seems error prone.

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Jason
  • 1,411
  • 3
  • 21
  • 28

7 Answers7

79

By default byte[] will be compared by reference which is not what you want in this case. What you need to do is specify a custom IEqualityComparer<byte[]> and do the comparison you want.

For example

public class ByteArrayComparer : IEqualityComparer<byte[]> {
  public bool Equals(byte[] left, byte[] right) {
    if ( left == null || right == null ) {
      return left == right;
    }
    return left.SequenceEqual(right);
  }
  public int GetHashCode(byte[] key) {
    if (key == null)
      throw new ArgumentNullException("key");
    return key.Sum(b => b);
  }
}

Then you can do

var dict = new Dictionary<byte[], string>(new ByteArrayComparer());

Solution for 2.0

public class ByteArrayComparer : IEqualityComparer<byte[]> {
  public bool Equals(byte[] left, byte[] right) {
    if ( left == null || right == null ) {
      return left == right;
    }
    if ( left.Length != right.Length ) {
      return false;
    }
    for ( int i= 0; i < left.Length; i++) {
      if ( left[i] != right[i] ) {
        return false;
      }
    }
    return true;
  }
  public int GetHashCode(byte[] key) {
    if (key == null)
      throw new ArgumentNullException("key");
    int sum = 0;
    foreach ( byte cur in key ) {
      sum += cur;
    }
    return sum;
  }
}
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454
  • @JaredPar, thanks this is what I want. I am stuck on 2.0 however, so if you happen to have an easy way on 2.0 I'd love to see it. Thanks. – Jason Sep 17 '09 at 18:26
  • I'm specifically curious about the Sum method (comparing two byte[]'s is pretty simple). Does Sum use an MD5 sum? – Jason Sep 17 '09 at 18:35
  • @Jason added a 2.0 solution. The Sum method just quite literally sums the values – JaredPar Sep 17 '09 at 19:40
  • 3
    Summing the results may not be the best hash code. Perhaps: sum = 33 * sum + cur; – user7116 Sep 17 '09 at 19:56
  • Hmm, yeah this is that "error prone" part I was talking about... :) Getting a truly unique hash code will be tricky. sixlettervariables idea isn't terrible but how can I verify that this is mathematically valid? – Jason Sep 17 '09 at 20:30
  • You rarely need a perfect hash code, you need relatively unique hash code. The "Times 33" is a popular, simple hash code, I think used by Perl at one point (like in the 5.0 and prior days). It uses prime or relatively prime numbers to get decent distributions. The wiki page for Hashtables has a good explanation: http://en.wikipedia.org/wiki/Hash_table – user7116 Sep 17 '09 at 21:06
  • Ah, that reminds of my data structures class back in college. Thanks for the info everyone this is working great. – Jason Sep 17 '09 at 21:08
  • [MSDN](http://msdn.microsoft.com/en-us/library/ms132151(v=vs.100).aspx): "We recommend that you derive from the EqualityComparer class instead of implementing the IEqualityComparer interface." – SerG Mar 04 '14 at 15:16
  • @SerG I would think a programmer who works at Microsoft would be aware of that. – JAB Mar 04 '14 at 15:22
  • @JAB So why did he recommend this way? – SerG Mar 04 '14 at 15:30
  • 1
    @SerG Because in this case, `EqualityComparer.Default.Equals` is the same as `Object.Equals`, so there's no point. – JAB Mar 04 '14 at 15:34
  • 16
    .NET 4 apparently introduced an equivalent, though: [`StructuralComparisons.StructuralEqualityComparer`](http://msdn.microsoft.com/en-us/library/system.collections.structuralcomparisons.structuralequalitycomparer.aspx), which reduces the solution to `var dict = new Dictionary(StructuralComparisons.StructuralEqualityComparer);`. (Which wouldn't have helped Jason due to the 2.0 restriction, but it's good to know.) – JAB Mar 04 '14 at 15:48
  • 1
    @JAB but I get `cannot convert from 'System.Collections.IEqualityComparer' to 'System.Collections.Generic.IEqualityComparer'` with StructuralComparisons.StructuralEqualityComparer. – SerG Mar 07 '14 at 12:46
  • 4
    @SerG Hm, I hadn't noticed that there's no generic version of `StructuralComparisons.StructuralEqualityComparer`. – JAB Mar 07 '14 at 13:00
  • @JAB So cannot it be used in such way and is an explicitly call of its Equals() necessary? – SerG Mar 07 '14 at 13:13
  • 1
    @SerG Not natively, it seems. Someone implemented a generic wrapper for it in another answer, though: http://stackoverflow.com/a/5601068/138772. – JAB Mar 07 '14 at 16:28
  • 1
    Just a note: "The GetHashCode method should not throw exceptions." – navossoc Jul 13 '17 at 03:28
15

So, JaredPar's answer is not bad but it could be better in a few ways. First of all, the IEqualityComparer page says "We recommend that you derive from the EqualityComparer class instead of implementing the IEqualityComparer interface."

Second, the implementation of GetHashCode is supposed to be fast. It's used to quickly eliminate obviously different objects, that would obviously be a waste of time to run Equals on. So GetHashCode should be much faster than actually running Equals.

Third, returning the sum of the byte array as JaredPar has done, is very likely to produce collisions - if the bytes are in different order, or the relative differences cancel each other out, etc.

So I would recommend a solution like this instead:

public class ByteArrayComparer : EqualityComparer<byte[]>
{
    public override bool Equals(byte[] first, byte[] second)
    {
        if (first == null || second == null) {
            // null == null returns true.
            // non-null == null returns false.
            return first == second;
        }
        if (ReferenceEquals(first, second)) {
            return true;
        }
        if (first.Length != second.Length) {
            return false;
        }
        // Linq extension method is based on IEnumerable, must evaluate every item.
        return first.SequenceEqual(second);
    }
    public override int GetHashCode(byte[] obj)
    {
        if (obj == null) {
            throw new ArgumentNullException("obj");
        }
        // quick and dirty, instantly identifies obviously different
        // arrays as being different
        return obj.Length;
    }
}

Above, returning obj.Length, is really quick and dirty, but also prone to return a lot of collisions. I think we can do better.

If you're going to examine all the bytes, something like this is less collision prone than the simple sum of bytes as in JaredPar's answer. But again, this examines all the elements, so it's not going to perform better than actually running Equals. You might as well just return 0 unconditionally, and always force the use of Equals.

I emphasize: this is better than returning the sum as in JaredPar's answer. And always returning 0 is better than this. And returning obj.Length is better than returning 0.

// This is not recommended. Performance is too horrible.
public override int GetHashCode(byte[] obj)
{
    // Inspired by fletcher checksum. Not fletcher.
    if (obj == null) {
        throw new ArgumentNullException("obj");
    }
    int sum = 0;
    int sumOfSum = 0;
    foreach (var val in obj) {
        sum += val; // by default, addition is unchecked. does not throw OverflowException.
        sumOfSum += sum;
    }
    return sum ^ sumOfSum;
}

If you happen to know that the byte[] arrays you're using as the key were themselves cryptographic hashes, then you can utilize this assumption to your benefit, and simply return the first 4 bytes converted to an int. It probably works alright too, for general-purpose byte arrays:

// This implementation works great if you assume the byte[] arrays
// are themselves cryptographic hashes. It probably works alright too,
// for general-purpose byte arrays.
public override int GetHashCode(byte[] obj)
{
    if (obj == null) {
        throw new ArgumentNullException("obj");
    }
    if (obj.Length >= 4) {
        return BitConverter.ToInt32(obj, 0);
    }
    // Length occupies at most 2 bits. Might as well store them in the high order byte
    int value = obj.Length;
    foreach (var b in obj) {
        value <<= 8;
        value += b;
    }
    return value;
}
Edward Ned Harvey
  • 6,525
  • 5
  • 36
  • 45
  • I like this line of thinking. I think for general purpose, maybe taking 2 bytes from the start and 2 bytes from the end might offer some advantages, like when working with a mix of little endian/big endian byte arrays. Of course, a more optimal solution is always available by knowing something about the structure of that data :) – Wayne Uroda Sep 08 '17 at 05:25
  • Moneyball: "_You might as well just return 0 unconditionally [if you're adding all the byte values in `GetHashCode`], and always force the use of Equals._" Really like the "create an int from the first four bytes" advice too, both of which thankfully shocked me back to remembering what `GetHashCode` is supposed to be doing. Excellent. – ruffin Nov 13 '20 at 15:07
4

Could you convert the byte[] to a string and use that as the key?

Something like:

        ASCIIEncoding enc = new ASCIIEncoding();
        byte[] input;
        string demo = new string(enc.GetChars(input));
        byte[] decode = enc.GetBytes(demo.ToCharArray());
JDunkerley
  • 12,355
  • 5
  • 41
  • 45
  • While this would cost more on performance, having to copy the data, could it be faster for retrieval since string has a better hash algorithm than iterating over the byte array? – Bryan Legend Jun 17 '15 at 23:05
4
using System;
using System.Collections;
using System.Collections.Generic;

[Serializable]
class StructuralEqualityComparer : IEqualityComparer, IEqualityComparer<object>
{
    public new bool Equals(object x, object y)
    {
        var s = x as IStructuralEquatable;
        return s == null ? object.Equals(x, y) : s.Equals(y, this);
    }

    public int GetHashCode(object obj)
    {
        var s = obj as IStructuralEquatable;
        return s == null ? EqualityComparer<object>.Default.GetHashCode(obj) : s.GetHashCode(this);
    }
}
Vladimir Reshetnikov
  • 11,750
  • 4
  • 30
  • 51
  • This works with byte[] perfectly (for now, I didn't find any problem yet), and looks the best. Thanks! – beatcoder Apr 23 '20 at 06:53
  • Update: It works, but it I'm using a byte[4] as key for dictionary, incrementing its bytes and after { 0, 32, 0, 0 } is can not insert because there are duplicates of the key. It can insert about 270k entries only, because of the hashing algorithm limitations. Not bad, though. – beatcoder Apr 23 '20 at 08:05
3

Your thought was my first thought as well. I don't think that it would be error prone. But if you don't like that option you could create a class that implements IEqualityComparer and pass an instance of it to the Dictionary's constructor.

Bryan
  • 2,775
  • 3
  • 28
  • 40
  • I considered computing my own hash code of the byte[] to be error prone, but it looks like there is no avoiding this... – Jason Sep 17 '09 at 18:33
1

Just made the EqualityComparer a little more generic, by not working on arrays but on IEnumerable<T>.

Due to the fact, that we now have a T we need to be able to specify an optional equality comparer for the elements.

Last but not least, GetHashCode() should never throw and sometimes you need it fast and sometimes you need it more accurate in the first run. So you can optionally define an accuracy from how many items (maximum) the hash code should be taken into account for our own hash.

public class EnumerableEqualityComparer<T> : IEqualityComparer<IEnumerable<T>>
{
    private static readonly Lazy<IEqualityComparer<IEnumerable<T>>> Lazy = new Lazy<IEqualityComparer<IEnumerable<T>>>(() => new EnumerableEqualityComparer<T>());
    private int accuracy;
    private IEqualityComparer<T> comparer;

    public EnumerableEqualityComparer()
        : this(-1)
    {
    }

    public EnumerableEqualityComparer(int accuracy)
        : this(accuracy, null)
    {
    }

    public EnumerableEqualityComparer(IEqualityComparer<T> elementEqualityComparer)
        : this(-1, elementEqualityComparer)
    {
    }

    public EnumerableEqualityComparer(int accuracy, IEqualityComparer<T> elementEqualityComparer)
    {
        if (accuracy < 0)
        {
            accuracy = 4;
        }

        this.accuracy = accuracy;
        comparer = elementEqualityComparer ?? EqualityComparer<T>.Default;
    }

    public static IEqualityComparer<IEnumerable<T>> Default { get; private set; } = Lazy.Value;

    public bool Equals(IEnumerable<T> x, IEnumerable<T> y)
    {
        if (ReferenceEquals(x, y))
        {
            return true;
        }

        if (ReferenceEquals(x, null)
            || ReferenceEquals(y, null))
        {
            return false;
        }

        return x.SequenceEqual(y, comparer);
    }

    public int GetHashCode(IEnumerable<T> obj)
    {
        if (ReferenceEquals(obj, null))
        {
            return -1;
        }

        var count = (obj as ICollection<T>)?.Count ?? 1;
        var hashCode = count * 49297;

        foreach (var item in obj.Take(accuracy))
        {
            hashCode += comparer.GetHashCode(item) * 17123;
        }

        return hashCode;
    }
}
Oliver
  • 43,366
  • 8
  • 94
  • 151
-3

When you are retrieving the items from the Dictionary you are using new operator for the byte[]. This will look for a different (new) byte[] instance in the Dictionary which is not present.

Here is a solution that will work:

 var dict = new Dictionary<byte[], string>();

            var b = new byte[] { 1,2,3};

            dict[b] = "my string";

            var value = dict[b]; 

            Console.WriteLine(value);
azamsharp
  • 19,710
  • 36
  • 144
  • 222
  • This is no solution. If you access the dictionary with another byte array it will throw an exception. var c = new byte[] { 1, 2, 3 }; var value = dict[c]; Last line will fail. – sahl04 Apr 05 '19 at 08:46