6

I am looking for performance efficient ways to compare two byte[] for equality. Sizes are above 1 MB, so the overhead for each array element should be minimized.

I aim to beat the speeds of SequenceEqual or a hand-coded for-loop over every item, by avoiding the repetitive bound checks for both arrays. In the same way that Array.Copy could lead to fast memcpy, what will lead to a memcmp?

Community
  • 1
  • 1
shojtsy
  • 1,710
  • 3
  • 17
  • 24
  • Do you need to compare two blocks only, or one block against several? Perhaps if you told us more about the scenario you're doing this in, even better solutions could be found? For instance, if you need to compare a sequence of blocks against many other blocks, a simple hash would at the very least give you a lot of guaranteed differences with minimal work, and then you could focus on the potentially false positives. – Lasse V. Karlsen Jan 31 '10 at 22:39

4 Answers4

16

You can use unsafe code to do pointer operations. You can compare the bytes four at a time as integers:

public static bool ArrayCompare(byte[] a, byte[] b) {
  if (a.Length != b.Length) return false;
  int len = a.Length;
  unsafe {
    fixed(byte* ap = a, bp = b) {
      int* aip = (int*)ap, bip = (int*)bp;
      for (;len >= 4;len-=4) {
        if (*aip != *bip) return false;
        aip++;
        bip++;
      }
      byte* ap2 = (byte*)aip, bp2 = (byte*)bip;
      for (;len>0;len--) {
        if (*ap2 != *bp2) return false;
        ap2++;
        bp2++;
      }
    }
  }
  return true;
}

A tested this against a simple loop, and it's about six times faster.

As suggested by Josh Einstein, long could be used on a 64 bit system. Actually it seems to be almost twice as fast both on 32 and 64 bit systems:

public static bool ArrayCompare64(byte[] a, byte[] b) {
  if (a.Length != b.Length) return false;
  int len = a.Length;
  unsafe {
    fixed (byte* ap = a, bp = b) {
      long* alp = (long*)ap, blp = (long*)bp;
      for (; len >= 8; len -= 8) {
        if (*alp != *blp) return false;
        alp++;
        blp++;
      }
      byte* ap2 = (byte*)alp, bp2 = (byte*)blp;
      for (; len > 0; len--) {
        if (*ap2 != *bp2) return false;
        ap2++;
        bp2++;
      }
    }
  }
  return true;
}
Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • +1 Great example. Though, on x64 systems you should use Int64. – Josh Jan 31 '10 at 21:46
  • And I assume the same technique can be used to compare eight or sixteen bytes at a time (long, decimal..)? – Aistina Jan 31 '10 at 21:47
  • +1 Very good indeed, SequenceEqual gives me ~1sec for a 50mb array while yours gives a nice 77ms :) – Diadistis Jan 31 '10 at 21:50
  • @Josh: Yes, actually it seems to be faster on a 32 bit system too. – Guffa Jan 31 '10 at 22:03
  • 1
    @Aistina: A long, yes, but not a decimal. There are different bit combinations in a decimal that give the same value, so you can get false positives. Also, comparing decimals is not a simple bitwise comparison, so you don't get any good performance. – Guffa Jan 31 '10 at 22:10
12

If performance really matters then the fastest way to do it is by using the CRT library included with every version of Windows. This code takes ~51 msec on my poky laptop, works on 64-bit machines too:

using System;
using System.Runtime.InteropServices;
using System.Diagnostics;

class Program {
  static void Main(string[] args) {
    byte[] arr1 = new byte[50 * 1024 * 1024];
    byte[] arr2 = new byte[50 * 1024 * 1024];
    var sw = Stopwatch.StartNew();
    bool equal = memcmp(arr1, arr2, arr1.Length) == 0;
    sw.Stop();
    Console.WriteLine(sw.ElapsedMilliseconds);
    Console.ReadLine();
  }
  [DllImport("msvcrt.dll")]
  private static extern int memcmp(byte[] arr1, byte[] arr2, int cnt);
}
Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • 1
    +1. There are other things like memory alignment that are probably considered in the CRT version. Not reinventing the wheel in the unsafe code is the way to go. Of course, only after profiling and proving that it's worth it--the standard disclaimer. – Mehrdad Afshari Jan 31 '10 at 22:24
  • +1. Much better to use a well tested optimised routine than rolling your own and hoping that it will somehow be as fast on whatever platform you happen to be running on. – Jason Williams Jan 31 '10 at 22:43
  • don't forget to pin the arrays in place! – Sebastian Good Feb 22 '10 at 18:53
  • Does this worth doing for short arrays? How long should the arrays be to make this worth using? How about calling this using C++/CLI? – brickner Jun 08 '10 at 19:41
  • 1
    Let us know when you find out. – Hans Passant Jun 08 '10 at 19:53
1

From: http://www.pinvoke.net/default.aspx/msvcrt.memcmp : Belowmentioned signature (by Saar) of memcmp is a x64 only signature. Using the the x64 only signatures on an x86 machine will result in a PInvoke stack imbalance. For x86 and x64 platform compatibility make sure you use a signature which specifies the Cdecl calling convention and uses the UIntPtr type to correctly marshall the size_t count argument:

    [DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
    static extern int memcmp(byte[] b1, byte[] b2, UIntPtr count);

    static bool doImagesMatch(byte[] b1, byte[] b2) 
    {     
        return b1.Length == b2.Length && memcmp(b1, b2, new UIntPtr((uint)b1.Length)) == 0;
    } 

I use this code with success but I didn't have time to measure performance (yet). I'm using small array's of roughly 600 bytes. I have to use x86-compatible code because the vast majority of the computers in our non-profit organisation is x86.

Obviously you need a fast algorhythm to convert bitmap to byte[].

B. Verhoeff
  • 143
  • 1
  • 6
0

[DllImport("msvcrt.dll")] unsafe static extern int memcmp(void* b1, void* b2, long count);

    unsafe static int ByteArrayCompare1(byte[] b1, int b1Index, int b1Length, byte[] b2, int b2Index, int b2Length)
    {
        CompareCount++;
        fixed (byte* p1 = b1)
        fixed (byte* p2 = b2)
        {
            int cmp = memcmp(p1 + b1Index, p2 + b2Index, Math.Min(b1Length, b2Length));
            if (cmp == 0)
            {
                cmp = b1Length.CompareTo(b2Length);
            }

            return cmp;
        }
    }
Saar
  • 1,753
  • 6
  • 20
  • 32