Compare 2 byte arrays

Question

I have 2 int arrays.

int[] data1 #
int[] data2 #

I want to create a 3rd int[] data3 which is the differences between the 2 other arrays.

Let us take the 1st value in data1.

The value is 15 (e.g.).

Now let us take the 1st value in data2.

The value is 3 (e.g.).

The 1st value in data3 would be 12.

BUT, if the 1st values were the other way round i.e.

data1[0]  = 3
data2[0]  = 15

then the difference would be -12. Yet I want it to be just 12.

at the moment I have a for loop and I do the computation stuff there to get that type of result.

Is there a way to do data1-data2 = data3 without enumerating through a loop?
If so, can I just get the differences without using minus numbers?

Thanks

N.B. In response to the 'closers'. I agree with you to a certain point. What I need to add to this questions is this:

I am looking for the most efficient (quickest way but low memory being 2nd priority) to facilitate this. Using Linq (as I understand it) can be the slowest approach?

I upgraded my answer a tiny bit. Performance gain might be useful with your data size. — PTwr, Sep 10 '14 at 08:07
@ptwr You took ur time :). Seriously, thank you very much. Will give it a go. Thanks — Andrew Simpson, Oct 24 '14 at 19:00
If you thought I was done, you were mistaken! Although this time I think its final version. — PTwr, Nov 06 '14 at 10:12
@PTwr Hi, you are a main obsessed! But I am not complaining. I shall take a look at this after work - thanks! — Andrew Simpson, Nov 06 '14 at 11:08

Selman Genç · Answer 1 · 2014-09-09T11:51:00.837

8

You are looking for Zip method

var data3 = data1.Zip(data2, (d1,d2) => Math.Abs(d1 - d2)).ToArray();

Enumerable.Zip<TFirst, TSecond, TResult> Method

Applies a specified function to the corresponding elements of two sequences, producing a sequence of the results.

So it simply takes each corresponding element, e.g data1[0] and data2[0], then data1[1] and data2[1] so on.. then applies the function Math.Abs(d1-d2) which simply substracts two numbers and gets the absolute value of the result. Then returns a sequence which contains the result of each operation.

edited Sep 09 '14 at 11:51

answered Sep 09 '14 at 11:32

Selman Genç

100,147
13
119
184

Hi, interesting. Zip does not seem to be a property of data1 though. Obviously, I am missing something here? – Andrew Simpson Sep 09 '14 at 11:34
@AndrewSimpson `using System.LINQ`? – flindeberg Sep 09 '14 at 11:35
Ah!, usie LinQ - sorry – Andrew Simpson Sep 09 '14 at 11:36
Hi, I am just trying yours and @GiannisParaskevopoulos at the moment as efficiency is key. – Andrew Simpson Sep 09 '14 at 11:38
1

Just tried this. Good stuff. What does Zip actually do in this context pls? – Andrew Simpson Sep 09 '14 at 11:41

score 7 · Accepted Answer · edited May 23 '17 at 12:16

"Is there a way to do data1-data2 = data3 without enumerating through a loop?" No. It is technically impossible.

At best, or rather worst, you can call function that will do enumeration for you. But it will be slow. In case of LINQ, ungodly slow.

For machine I am currently working on results from other answers are as following for 4KB table (1024 integers).

23560 ticks - Giannis Paraskevopoulos. Array-Enumerable-Array conversions not too fast, Copying array via ToList().ToArray() chain is roughly 25 times slower than Array.Copy().
10198 ticks - Selman22. 2 times faster, but still slow. Lambdas are eye candy to make creating events prettier, not faster. You end up with some anonymous method laying around, which takes can eat more CPU time for call-return than its operation (remember that math we do here CPU can do in just few cycles).
566 ticks - Tim Schmelter GetDifference() function (Main culprit is JIT here, in native code and/or more often usage difference would be negligible)
27 ticks - Just a loop. 400 times faster than Zip, over 800 faster than converting array to list and back.

Loop code:

for (int i = 0; i < data3.Length; i++)
{
  data3[i] = Math.Abs(data1[i] - data2[i]);
}

Such basic memory operations can be directly translated to machine code without horrible performance and humongous memory footprint of LINQ.

Moral of the story is: LINQ is for readability (which in this case is arguable) not for performance (which in this case is noticeable).

Optimization time! Lets abuse our CPU a tiny bit.

Unroll loop. Or do not. Your experience may vary. Even in assembler itself loop unrolling performance gain or lose vary greatly in same family of processors. New CPU's and compilers are aware of old tricks and simply implement them on their own. For i3-3220 I tested code on loop unroll to 4 lines resulted in faster execution on 32bit code but on 64bit it was a bit slower while unroll to 8 was opposite.
Compile for x64. As we are here working on 32bit data we won't make use of 64bit registers... or will we? On x86 less than half of registers are truly available to generated code (in assembly written by hand you can always squeeze out more), on x64 however you get eight bonus registers which are free to use. The more you can do without accessing memory, the faster your code. In this case speed gain is at about 20%.
Close Visual Studio. Do not speed-test 64 bit code in 32bit IDE (there is no 64bit version as for now, and probably wont be for long time). It will make x64 code roughly two times slower due to architecture mismatch. (Well... you should never speed-test code under debugger anyway...)
Do not use built-in functions too much. In this case Math.Abs have overhead hidden inside. For some reasons (which will need analyze of IL to find out), checking for negative values was faster with ?: than with If-Else. Such check saved a lot of time.

UPDATE: ?: is faster than If-Else due to differences in resulting machine code... at least for just comparing two values. Its machine code is much less weird than If-Else (which does not look like what you would write "by-hand"). Apparently it is not just different form of writing If-Else statement but fully separate command optimized for simple conditional assignment.

Resulting code was roughly 8 times faster than simple loop with Math.Abs(); Remember you can unroll loop only to divisors of your dataset size. You wrote that your dataset size is 25920, so 8 is fine. (max is 64, but I doubt it would have any sense to go that high). I suggest hiding this code in some function as it is fugly.

int[] data3 = new int[data1.Length];
for (int i = 0; i < data1.Length; i += 8)
{
    int b;
    b = (data1[i + 0] - data2[i + 0]);
    data3[i + 0] = b < 0 ? -b : b;
    b = (data1[i + 1] - data2[i + 1]);
    data3[i + 1] = b < 0 ? -b : b;
    b = (data1[i + 2] - data2[i + 2]);
    data3[i + 2] = b < 0 ? -b : b;
    b = (data1[i + 3] - data2[i + 3]);
    data3[i + 3] = b < 0 ? -b : b;
    b = (data1[i + 3] - data2[i + 4]);
    data3[i + 4] = b < 0 ? -b : b;
    b = (data1[i + 5] - data2[i + 5]);
    data3[i + 5] = b < 0 ? -b : b;
    b = (data1[i + 6] - data2[i + 6]);
    data3[i + 6] = b < 0 ? -b : b;
    b = (data1[i + 7] - data2[i + 7]);
    data3[i + 7] = b < 0 ? -b : b;
}

This is not even its final form. I will try to do some more heretic tricks on it.

BitHack's, low level cheats!

As I mentioned, there was still place for improvements.

After cutting out LINQ, main ticks munchkin was Abs(). When it was removed from code we got left with contest between IF-ELSE and shorthand ?: operator. Both are branching operators, which once in the past were widely known as being slower than linear code. Currently ease of use/writing tend to be picked over performance (sometimes correctly, sometimes incorrectly).

So lets make our branching condition linear. It is possible by abusing fact that branching in this code contains math operating on just single variable. So lets make code equivalent of this.

Now do you remember how to negate Two's complement number?, negate all bits and add one. Lets do it in one line without conditions then!

It is bitwise operators time to shine. OR and AND are boring, real men use XOR. Whats so cool about XOR? Aside of its usual behavior you can also turn it into NOT (negation) and NOP (no-operation).

1 XOR 1 = 0
0 XOR 1 = 1

so XOR'ing by value filled with only 1's gives you NOT operation.

1 XOR 0 = 1
0 XOR 0 = 0

so XOR'ing by value filled with only 0's does nothing at all.

We can obtain sign from our number. For 32bit integer it is as simple as x>>31. It moves bit sign to lowest bit. As even wiki will tell you, bits inserted from left will be zeros, so you result of x>>31 will be 1 for negative number (x<0) and 0 for non-negative (x>=0), right?

Nope. For signed values Arithmetic shift is used over plain bit-shift. So we will get -1 or 0 depending on sign.... which means that 'x>>31' will give 111...111 for negative and 000...000 for non-negative. If you will XOR original x by result of such shift you will perform NOT or NOP depending on value sign. Another useful thing is that 0 will result in NOP for addition/negation so we can add/subtract -1 depending on value sign.

So 'x^(x>>31)' will flip bits of negative number while making no changes to non-negative and 'x-(x>>31)' will add 1 (negated negative value gives positive) to negative x and make no changes to non-negative value.

When combined you get '(x ^ (x >> 31)) - (x >> 31)'... which can be translated to:

IF X<0
  X=!X+1

and it is just

IF X<0
  X=-X

How does it affect performance? Our XorAbs() requires just four basic integer operations with one load and one store. Branching operator itself takes about as much CPU ticks. And while modern CPU's are great at doing branch-predicting, they are still faster by simply not doing it when feed sequential code.

And whats the score?

Roughly four times faster than built-in Abs();
About twice as fast as previous code (versions without unrolling)
Depending on CPU it can get better result without loop-unrolling. Due to elimination of code branching CPU can "unroll" loop on its own. (Haswells are weird with unrolling)

Resulting code:

for (int i = 0; i < data1.Length; i++)
{
  int x = data1[i] - data2[i];
  data3[i] = (x ^ (x >> 31)) - (x >> 31);
}

Parallelism and Cache usage

CPU have super fast Cache memory, when processing an array sequentially it will copy whole chunks of it to cache. But if you write crappy code you will get cache misses. You can easily fall into this trap by screwing up order of nested loops.

Parallelism (multiple threads, same data) must works on sequential chunks in order to make good use of cpu cache.

Writing threads by hand will allow you to pick chunks for threads manually, but it is bothersome way. Since 4.0 .NET comes with helpers for that, however default Parallel.For makes a mess of cache. So this code is actually slower than its single thread version due to cache-miss.

Parallel.For(0, data1.Length,
fn =>
{
  int x = data1[fn] - data2[fn];
  data3[fn] = (x ^ (x >> 31)) - (x >> 31);
}

It is possible to make manual use of cached data by performing sequential operation in it. For example you can unroll loop, but its dirty hack and unrolling have its own performance issues (it depends on CPU model).

Parallel.For(0, data1.Length >> 3,
i =>
{
    int b;
    b = (data1[i + 0] - data2[i + 0]);
    data3[i + 0] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 1] - data2[i + 1]);
    data3[i + 1] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 2] - data2[i + 2]);
    data3[i + 2] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 3] - data2[i + 3]);
    data3[i + 3] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 3] - data2[i + 4]);
    data3[i + 4] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 5] - data2[i + 5]);
    data3[i + 5] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 6] - data2[i + 6]);
    data3[i + 6] = b < 0 ? (b ^ -1) + b : b;
    b = (data1[i + 7] - data2[i + 7]);
    data3[i + 7] = b < 0 ? (b ^ -1) + b : b;
}

However .NET also have Parrarel.ForEach and Load Balancing Partitioners. By using both of them you get best of all worlds:

dataset size independent code
short, neat code
multithreading
good cache usage

So final code would be:

var rangePartitioner = Partitioner.Create(0, data1.Length);
Parallel.ForEach(rangePartitioner, (range, loopState)
=>
{
    for (int i = range.Item1; i < range.Item2; i++)
    {
        int x = data1[i] - data2[i];
        data3[i] = (x ^ (x >> 31)) - (x >> 31);
    }
});

It is far from maximum CPU usage (which is more complicated than just maxing its clock, there are multiple cache levels, several pipelines and much more) but it is readable, fast and platform independent (except integer size, but C# int is alias to System.Int32 so we are safe here).

Here I think we will stop with optimization. It came out as an article rather than answer, I hope no one will purge me for it.

Hi, thanks for all that info. Yes, I had thought using LINQ was slower than 'traditional' methods. This is the same code as Tim's? But you have given me more info though — Andrew Simpson, Sep 09 '14 at 12:02
@AndrewSimpson Yes, I used their code. Also I just updated answer with a bit better results for LINQ based code. — PTwr, Sep 09 '14 at 12:06
just making sure I was not missing something. You gave extra info with the speed results which gives me the context. thansk — Andrew Simpson, Sep 09 '14 at 12:12
@PTwr: Why does my `GetDifference` needs 20 times more than your loop? I'm also just using a loop, so exactly the same code. Also, how have you tested it at all? I would also remove the sentence that starts with _"Such basic memory operations"_ since it's very confusing and LINQ is often not noticeably less efficient. — Tim Schmelter, Sep 09 '14 at 12:15
@TimSchmelter Probably because your method is a method so it have a tiny bit of overhead and worse JIT optimization for one-time use (when looping several thousand times it gets tiny bit faster than inline loop, JIT is weird). I refuse to remove "Such basic memory operations" as this task can be pretty much done "on the fly" by CPU, LINQ is made for complex data bound objects and should stay far away from such operations. — PTwr, Sep 09 '14 at 12:48
@PTwr: it's a difference if you test a method one time with very large arrays or million times with very small arrays. However, have you also measured the initialization of your third array? The method call itself is nearly a no-op. Make sure you measure release mode code as optimizations are turned off by default for debug builds. Also call a method _before_ you start measuring. I assume that you've used `Stopwatch` instead of `DateTime`. — Tim Schmelter, Sep 09 '14 at 12:51
@TimSchmelter method call is close to nothing only in native code world ;) Managed code world with JIT compilation and optimization is weird (see [this](http://stackoverflow.com/questions/8928403/try-catch-speeding-up-my-code) question for example of weirdness). I guess that in this case most of time was munched by such weird stuff on first run of method. I have few ideas to possibly make even faster code, but it will need to wait for later today/tomorrow. If you want to compare algorithm then native code (eg. pure C) will give more clear image. — PTwr, Sep 09 '14 at 13:00
@PTwr: the difference between non-method and method [is absolutely negligible](http://stackoverflow.com/a/12384798/284240). Write safe, readable and maintainable code, not code that needs few milliseconds less in one million iterations. — Tim Schmelter, Sep 09 '14 at 13:05
@TimSchmelter Watch out, take it too far and people will start doing math with LINQ and Lambdas ;) I have nothing about refactoring repeatable code, I added your answer to list just to keep it complete, not to compete. I added notes to post explaining performance drop culprit to all answers. — PTwr, Sep 09 '14 at 13:40

score 3 · Answer 3 · edited May 23 '17 at 12:00

3

Here is another (less readable but maybe a little bit more efficient) approach that does not need LINQ:

public static int[] GetDifference(int[] first, int[] second)
{
    int commonLength = Math.Min(first.Length, second.Length);
    int[] diff = new int[commonLength];
    for (int i = 0; i < commonLength; i++)
        diff[i] = Math.Abs(first[i] - second[i]);
    return diff;
}

Why little bit more efficient? Because ToArray has to resize the array until it knows the final size.

edited May 23 '17 at 12:00

Community

1
1

answered Sep 09 '14 at 11:43

Tim Schmelter

450,073
74
686
939

My God! I am being spoilt for choice here! Fantastic stuff from everyone. i need to speed test these 3 approaches and I will report back ASAP.Thanks – Andrew Simpson Sep 09 '14 at 11:45
@AndrewSimpson: are the arrays large or small but you need to call it often? – Tim Schmelter Sep 09 '14 at 11:52
1

@AndrewSimpson: Unless GetDifference is a bottleneck in your code, prefer readability, not performance. – Brian Sep 09 '14 at 11:54
Hi, the array size is a result from flattening a 2-d array. The inital 2d array is 144x180. The resultant flattened array is therefore 25920. I am comparing as many of these arrays as I can (which is my FPS) rate. So, many times a second.. – Andrew Simpson Sep 09 '14 at 11:56
@Brian: i also like LINQ and have upvoted the Zip-approach. But since OP has already mentioned that performance matters i have provided a non-LINQ approach. Also, this method is not really complex. – Tim Schmelter Sep 09 '14 at 11:57
@AndrewSimpson: not at all, however, i'm not yet convinced from the result of his performance tests. – Tim Schmelter Sep 09 '14 at 12:17
hi, OK, understood. I am at work at the moment but will be bale to do some proper speed tests when i get back. I shall update my position then. Thanks :) – Andrew Simpson Sep 09 '14 at 12:19

Giannis Paraskevopoulos · Answer 4 · 2014-09-09T11:39:55.387

1

var data3 = data1.Select((x,i)=>new {x,i})
    .Join
    (
        data2.Select((x,i)=>new {x,i}),
        x=>x.i,
        x=>x.i,
        (d1,d2)=>Math.Abs(d1.x-d2.x)
    )
    .ToArray();

edited Sep 09 '14 at 11:39

answered Sep 09 '14 at 11:33

Giannis Paraskevopoulos

18,261
1
49
69

Compare 2 byte arrays

4 Answers4

Optimization time! Lets abuse our CPU a tiny bit.

BitHack's, low level cheats!

Parallelism and Cache usage

Linked