-1

Hi I'm trying to find the fastest way to compare two double arrays in C#. Happy to use unsafe if it is suitable. One of the biggest issues I've seen with byte compare solutions is I don't want to copy the double array to a byte array to do pinvoke for example. If there was a performant way of treating the double array as a byte array to pass to pinvoke memcmp call that would be great.

This is the byte array compare solution I am referring to. Comparing two byte arrays in .NET

I am aiming for faster than iterating and comparing the elements in two double arrays.

For reference, my problem requires comparing these arrays approximately 10 trillion times. This component of the computation takes approximately 30% of the total in the operation so significant savings could be made here.

Currently we're run .NET 4.6 - 4.8.

Shiv
  • 1,274
  • 1
  • 19
  • 23
  • 1
    So faster than what? what are we trying to beat here, or should we innately have these benchmarks done for you? I would suggest, getting benchmark.net and trying some solutions – TheGeneral Jul 15 '20 at 03:29
  • @TheGeneral Faster than looping and comparing the elements of two double arrays. – Shiv Jul 15 '20 at 03:30
  • Did you try SequenceEquals how does that compare, what about unsafe pointer comparison. – TheGeneral Jul 15 '20 at 03:31
  • @TheGeneral unsafe pointer comparison - memcmp takes a byte array not a double array hence I need to somehow treat the double pointer as a byte pointer with appropriate length adjustment. Not sure how to do that without copying the double array to a byte array using bitconverters etc. I am concerned it is suboptimal. – Shiv Jul 15 '20 at 03:33
  • i mean `fixed (double* p = arr)` – TheGeneral Jul 15 '20 at 03:35
  • Looking at the question you linked, it looks like using `ReadOnlySpan.SequenceEqual` would be a good starting point for your benchmarks seeing as it wins most of the benchmarks in the provided answer. – Chris Jul 15 '20 at 03:36
  • 2
    If you're concerned a solution would be sub-optimal, add it to your benchmarks. If you aren't benchmarking potential solutions, take some time to setup [BenchmarkDotNet](https://benchmarkdotnet.org/) – Chris Jul 15 '20 at 03:43
  • @Chris Requires .NET Core 2.1+? I should update my question. I need something on .NET 4.6 - 4.8. – Shiv Jul 15 '20 at 04:00
  • How many elements in the array? – TheGeneral Jul 15 '20 at 04:05
  • @TheGeneral if I have double[] input1 and I want to have a byte pointer do you know the syntax for that conversion? – Shiv Jul 15 '20 at 04:07
  • @TheGeneral variable... have to check input1.Length (and validate == input2.Length) to do the compare. – Shiv Jul 15 '20 at 04:08
  • Added a benchmark – TheGeneral Jul 15 '20 at 05:10

1 Answers1

6

Updated with some benchmarks (see the note at the end):

[SimpleJob(RuntimeMoniker.Net472)]
public class Test
{
   private double[] data1;
   private double[] data2;
   [Params(1000, 10000000)] public int N;

   [GlobalSetup]
   public void Setup()
   {
      var r = new Random(42);
      data1 = Enumerable.Range(0, N).Select(x => r.NextDouble()).ToArray();
      data2 = data1.ToArray();
   }

   [DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
   private static extern int memcmp(IntPtr a1, IntPtr a2, uint count);

   [Benchmark]
   public unsafe bool Memcmp()
   {
      fixed (double* p1 = data1, p2 = data2)
         return memcmp((IntPtr) p1, (IntPtr) p2, (uint) data1.Length * sizeof(double)) == 0;
   }

   [Benchmark]
   public unsafe bool UnsafeCompare()
   {
      fixed (double* p1 = data1, p2 = data2)
         for (var i = 0; i < data1.Length; i++)
            if (p1[i] != p2[i])
               return false;
      return true;
   }

   [Benchmark]
   public bool SequenceEqual()
   {
      return data1.SequenceEqual(data2);
   }

   [Benchmark]
   public bool RegularCompare()
   {
      for (var i = 0; i < data1.Length; i++)
         if (data1[i] != data2[i])
            return false;
      return true;
   }

   [Benchmark]
   public unsafe bool SpanSequenceEqual1()
   {
      fixed (double* p1 = data1, p2 = data2)
         return new Span<double>(p1, data1.Length).SequenceEqual(data2);
   }

   [Benchmark]
   public unsafe bool SpanSequenceEqual2()
   {
      fixed (double* p1 = data1, p2 = data2)
         return new Span<double>(p1, data1.Length).SequenceEqual(new Span<double>(p2, data2.Length));
   }
}

There is no fault tolerance here, and would be prudent to add. Also this is totally untested.

Benchmark for 2 sizes of arrays:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18362.900 (1903/May2019Update/19H1)
Intel Core i7-7700 CPU 3.60GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.8 (4.8.4180.0), X64 RyuJIT
  .NET 4.7.2 : .NET Framework 4.8 (4.8.4180.0), X64 RyuJIT

Job=.NET 4.7.2  Runtime=.NET 4.7.2

|             Method |        N |             Mean |           Error |        StdDev |
|------------------- |--------- |-----------------:|----------------:|--------------:|
|             Memcmp |     1000 |         280.0 ns |         1.59 ns |       1.49 ns |
|      UnsafeCompare |     1000 |         536.5 ns |         2.35 ns |       1.84 ns |
|      SequenceEqual |     1000 |      12,020.5 ns |       238.08 ns |     370.66 ns |
|     RegularCompare |     1000 |         807.9 ns |         4.57 ns |       4.05 ns |
| SpanSequenceEqual1 |     1000 |         561.7 ns |         7.67 ns |       7.18 ns |
| SpanSequenceEqual2 |     1000 |         556.8 ns |         4.59 ns |       4.07 ns |
|             Memcmp | 10000000 |  10,809,916.0 ns |   215,874.22 ns | 302,625.51 ns |
|      UnsafeCompare | 10000000 |  11,357,531.6 ns |   226,782.10 ns | 242,654.31 ns |
|      SequenceEqual | 10000000 | 117,777,038.7 ns | 1,055,512.03 ns | 987,326.61 ns |
|     RegularCompare | 10000000 |  11,857,691.7 ns |   186,827.02 ns | 165,617.28 ns |
| SpanSequenceEqual1 | 10000000 |  11,371,142.7 ns |   151,452.88 ns | 141,669.12 ns |
| SpanSequenceEqual2 | 10000000 |  11,160,517.2 ns |   172,947.95 ns | 153,313.85 ns |

Warning, this is not the best example of a benchmark, you should really run these benchmarks yourself on your own data, on your own environment. If you are going for peak performance you will need to factor in things like how likely it is for a fall array scan, the size of the array etc.

halfer
  • 19,824
  • 17
  • 99
  • 186
TheGeneral
  • 79,002
  • 9
  • 103
  • 141