1

My object will has an Array of bytes, it may have thousands of elements. This array will be set during construction and then never change. I need to be able to compare the arrays from 2 separate objects to see if the are exactly the same.

I know I can use Enumerable.SequenceEqual to compare two values, but that has an overhead that I want to avoid.

My plan is to use something like this Good GetHashCode() override for List of Foo objects respecting the order right after generating the collection and storing that hash for comparison.

I am wondering if there is an immutable collection type build into C# or .Net that already does this, or if there is a better option I have overlooked.

Community
  • 1
  • 1
Anthony Nichols
  • 1,586
  • 2
  • 25
  • 50
  • Something like this? https://blogs.msdn.microsoft.com/dotnet/2013/09/25/immutable-collections-ready-for-prime-time/ – Darren Wainwright Oct 06 '16 at 19:20
  • 2
    HashCode is not good idea due to possible collision. In real life, hash code is used to indicates that obejcts are "possible equal" or "definitely not equal". – pwas Oct 06 '16 at 19:20
  • 2
    @nopeflow "definitely not equal" is enough to avoid more costly *SequenceEqual* checks for every item... SequenceEqual can be used only when hashcodes are equal.... – L.B Oct 06 '16 at 19:24
  • @L.B Yeep, that's right and good place to improve performance - but we can't depend only on hash code ;) – pwas Oct 06 '16 at 19:25
  • @Darren - From what I can see those collections don't return any hashcodes for the collections... and I don't see anything that would save me time over just using an array or IEnumerable. – Anthony Nichols Oct 06 '16 at 19:32
  • @nopeflow & L.B Yeep -- thanks for those tips - I will keep them in mind. – Anthony Nichols Oct 06 '16 at 19:33

1 Answers1

1

I have put together a few different methods of comparing arrays of bytes, I have used an arbitrary array length of 10000 and assumed that both compared arrays are of the same length (because a "broad phase" length check is obviously not very interesting :) )

Perhaps you could use this as a basis for making a decision on which method to use when comparing the arrays for equality.

The results are the average of 5 iterations for three scenarios (equal, first element different and last element different) and the timings are in ms.

---------------
Identical elements
---------------
SequenceEqual: 5.98142
BasicEqual: 0.11864
UnsafeMemCmp: 0.15542
SafeMemCmp: 0.12896
---------------
First element different
---------------
SequenceEqual: 0.00056
BasicEqual: 0.00012
UnsafeMemCmp: 0.0002
SafeMemCmp: 0.00182
---------------
Last element different
---------------
SequenceEqual: 0.14942
BasicEqual: 0.03178
UnsafeMemCmp: 0.0015
SafeMemCmp: 0.00326
---------------

The 4 methods I have chosen are:

SequentalEqual

static bool SequenceEqual(byte[] arr1, byte[] arr2)
{
    return arr1.SequenceEqual(arr2);
}

BasicEqual

static bool BasicEqual(byte[] arr1, byte[] arr2)
{
    for (var i = 0; i < 10000; i++)
        if (arr1[i] != arr2[i])
            return false;
     return true;
}

UnsafeMemCmp

[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
static extern unsafe int memcmp(byte* b1, byte* b2, int count);

static unsafe bool UnsafeMemCmp(byte[] arr1, byte[] arr2)
{
    fixed (byte* b1 = arr1, b2 = arr2)
    {
        return memcmp(b1, b2, 10000) == 0;
    }
}

SafeMemCmp

[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
static extern int memcmp(IntPtr b1, IntPtr b2, int count);

static bool SafeMemCmp(byte[] arr1, byte[] arr2)
{
    var a = Marshal.AllocHGlobal(arr1.Length);
    var b = Marshal.AllocHGlobal(arr2.Length);

    try
    {        
        Marshal.Copy(arr1, 0, a, arr1.Length);
        Marshal.Copy(arr2, 0, b, arr2.Length);

        return memcmp(a, b, 10000) == 0;
    }
    finally
    {
        Marshal.FreeHGlobal(a);
        Marshal.FreeHGlobal(b);
    }
}

For completion, the tests were run using the following method:

static void RunTest(string name, Func<byte[], byte[], bool> action, byte[] a, byte[] b)
{
    TimeSpan total = TimeSpan.Zero;

    for (var i = 0; i < 5; i++)
    {
        _stopwatch.Reset();
        _stopwatch.Start();
        action(a, b);
        _stopwatch.Stop();
        total += _stopwatch.Elapsed;
    }

    Console.WriteLine(name + ": " + (total.TotalMilliseconds / 5));
}
Scott Perham
  • 2,410
  • 1
  • 10
  • 20