Memory comparison (with difference position)

Question

Is there a way to compare two blocks of memory, and know at which point they differ (memcmp() does not meet this requirement)? I wouldn't want to perform costly loops. Thanks in advance.

Regards, Neo_b

see also http://stackoverflow.com/questions/855895/intrinsic-memcmp about per-cpu optimized memcmp implementations. If you know the cpu you could tune one of gcc's __builtin_memcmp() functions to your needs. — mvds, Aug 24 '10 at 22:46
Note that anything you have here is going to be implemented as a loop *somewhere* -- there's no magic way of doing what you want here without one. — Billy ONeal, Aug 25 '10 at 02:11

score 7 · Answer 1 · edited Feb 02 '20 at 00:14

7

std::mismatch will do that for you in conjunction std::distance.

edited Feb 02 '20 at 00:14

jtbandes

115,675
35
233
266

answered Aug 24 '10 at 22:26

Eugen Constantin Dinca

8,994
2
34
51

You assumed he's using STL iterator, and moreover he need to know at which point memory differs. – Doomsday Aug 24 '10 at 22:29
I had std::equal first which was obviously wrong so I've corrected it. Algorithms work very nicely with pointers as well as with (full blown) iterators. – Eugen Constantin Dinca Aug 24 '10 at 22:31
3

@Doomsday: `char*` *is* an iterator type, and `mismatch` does return two iterators pointing to the difference. +1 – Potatoswatter Aug 24 '10 at 23:40
1

@Doomsday: To better clarify what other commenters have said, all of the STL algorithms work with plain pointers, not just iterators. – Billy ONeal Aug 25 '10 at 02:10
Err, my initial comment was on std::equal, not std::mismatch. I'm fine with the mismatch solution. – Doomsday Aug 25 '10 at 10:00
To be nitpicky, the link is to the docs for the sgi `mismatch`, which is *not* `std::mismatch` – Baruch Oct 02 '17 at 08:26

score 2 · Answer 2 · answered Aug 24 '10 at 22:24

2

Compared to whatever else you are doing, a loop is cheap: the big cost will be retrieving the data from ram (or disk!) in the first place.

answered Aug 24 '10 at 22:24

James

24,676
13
84
130

score 2 · Answer 3 · answered Aug 24 '10 at 22:25

2

You can't avoid looping with memory comparison of more than a few bytes. Write the algorithm as you can imagine it. It's simple enough and you might be amazed how well the compiler optimizes code like this.

answered Aug 24 '10 at 22:25

tenfour

36,141
15
83
142

score 2 · Answer 4 · edited Jan 12 '15 at 16:44

2

memcmp simply does a "costly loop", byte for byte. For example, here is Microsoft's implementation:

EXTERN_C int __cdecl memcmp(const void *Ptr1, const void *Ptr2, size_t Count)
{
    INT v = 0;
    BYTE *p1 = (BYTE *)Ptr1;
    BYTE *p2 = (BYTE *)Ptr2;

    while(Count-- > 0 && v == 0) {
        v = *(p1++) - *(p2++);
    }

    return v;
}

Most other implementations do the exact same thing. For your needs, you could do something like this:

long my_memcmp(const void *Ptr1, const void *Ptr2, size_t Count)
{
    INT v = 0;
    long pos = 0;
    BYTE *p1 = (BYTE *)Ptr1;
    BYTE *p2 = (BYTE *)Ptr2;

    while(Count-- > 0 && v == 0) 
    {
        v = *(p1++) - *(p2++);
        if (v == 0)
            pos++;
        else
            break;
    }

    return pos;
}

edited Jan 12 '15 at 16:44

Roland Pihlakas

4,246
2
43
64

answered Aug 24 '10 at 22:33

Andrew

1,765
2
17
24

byte-per-byte is indeed costly. 32-bit `int` operations may even be faster than their 8-bit counterparts. – mvds Aug 24 '10 at 22:48
I created my own implementation (I thought I could replace it with something else eventually). My needs require up to 10 000 000 iterations. The system freezes sometimes, but it works. It also says how many bytes do not match after a first non-match occurence. – Neo_b Aug 24 '10 at 23:47
@Neo_b: 10 million iterations is not that much -- most any system will do that in a quarter second or less. I'd be looking at your input buffering scheme, or considering rethinking how you are attacking this problem. If you are searching for strings, for example, the Boyer Moore algorithm will probably do you better than anything here. – Billy ONeal Aug 25 '10 at 02:15
I'm actually checking if two images (or their parts) are identical (for my remote desktop server). No need to send what hasn't changed, right? :) – Neo_b Aug 27 '10 at 02:51
Wouldn't you want an algorithm that returned all changed bytes then, not just the first location of the changed bytes? – Andrew Aug 30 '10 at 19:17

score 1 · Answer 5 · answered Aug 24 '10 at 23:00

If there was a better way of comparing two blocks of memory, memcmp would be reimplemented to do that.

Having said that often, memcmp has a default portable implementation in the standard C library but there are is often implemented by the compiler itself as a builtin function. This builtin function should be highly optimized for the target architecture.So take the library implementation with a pinch of salt.

mvds · Answer 6 · 2010-08-24T22:42:22.313

0

You will always need a loop. But you could benchmark if looping by 4 bytes (cast to int*) or by 8 bytes (uint64_t or long long int) is faster than the naive per-byte solution.

Even better, depending on the length (say, >1kb) you could unroll the loop, meaning you check e.g. per 8 int/uint64_t and on a mismatch pinpoint the first differing byte.

uint64_t *bigsteps1 = (uint64_t*)m1;
uint64_t *bigsteps2 = (uint64_t*)m2;
int steps = min(m1_len,m2_len)/sizeof(uint64_t);
int i;
for ( i=0; i<steps; i+=8 )
{
    if ( bigsteps1[i]   != bigsteps2[i]
      || bigsteps1[i+1] != bigsteps2[i+1]
   /// ....
      || bigsteps1[i+7] != bigsteps2[i+7] ) break;
}

// i<steps tells if we found a difference
// end game is trivial, left as an excercise to the reader.

The loop unroll may also backfire, for you have all these +N things in there and the i+=8 as well. Benchmark to be sure.

ps also check memory alignment: this will be fastest when m1&0xff == m2&0xff == 0

edited Aug 24 '10 at 22:42

answered Aug 24 '10 at 22:36

mvds

45,755
8
102
111

Thanks for the advice, I'll most certainly implement it, although I am not entirely sure what m1&0xff == m2&0xff == 0 is supposed to do, from what I know m1&0xff == m1, isn't that correct? – Neo_b Aug 24 '10 at 23:55
This will be faster in some cases, but it could result in some issues. First of all, it relies on your platform having the same alignment for 64 bit integers as it does for characters, which is often not the case. (Nobody says the base of the character array has to be on an 8 byte boundary) Second, a builtin intrinsic or assembler will probably be faster. On x86, the memory alignment issue will just make things slower, and on other architectures, it will cause the processor to throw an interrupt. – Billy ONeal Aug 25 '10 at 02:13
@Neo_b: `m1&0xff == 0` is a test if the address `m1` ends with `00`. @Billy: Of course in these kind of optimisations you must fiddle a little with the boundaries, so until the first aligned block you test slow, then test as much blocks as possible fast, and test the remainder slow. (as stated these things only work out positively if the blocks are big enough) A builtin intrinsic or assembler will probably be faster *if it would exists* which I think is not the case for the problem at hand. – mvds Aug 25 '10 at 17:20
Oh, right, my mistake (the 0xff without zeros before somehow illogically tricked me into thinking that m1 was a byte). – Neo_b Aug 27 '10 at 02:48

Memory comparison (with difference position)

6 Answers6