0
#include <stdio.h>
#include <string.h>

int main()
{
    int test1 = 8410092;    // 0x8053EC
    int test2 = 8404974;    // 0x803FEE
    char *t1 = ( char*) &test1;
    char *t2 = (char*) &test2;
    int ret2 = memcmp(t1,t2,4);

    printf("%d",ret2);

}

Here's a very basic function that when run prints -2. Maybe I am totally misunderstanding memcmp, but I thought if it returns the difference between the first different bytes. Since test1 is a larger num than test2, shouldn't the printed value be positive?

I am using the standard gcc.7 compiler for ubuntu.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
knowads
  • 705
  • 2
  • 7
  • 24
  • 5
    `memcmp` compares bytes, not `int` values. So endianness is relevant. – Weather Vane Nov 27 '18 at 00:25
  • 4
    Also `memcmp` does not specify what the magnitude of the return value means. Only `< 0` or `0` or `> 0`. – Weather Vane Nov 27 '18 at 00:29
  • @WeatherVane I realize the magnitude isn't important. I just don't understand why the signs aren't the same. Are you saying that it's comparing the LSB of test1 with the LSB of test2 first instead of the MSB? – knowads Nov 27 '18 at 00:31
  • 2
    As I wrote, the `int` values are irrelevant. `memcmp` compares byte by byte. It knows nothing about your `int` values. For example, suppose you passed it two arrays of two 16-bit values each. Would you expect `memcmp` to know they were, say, `short` values? – Weather Vane Nov 27 '18 at 00:32
  • 1
    On your Ubuntu machine, the LSB of the values is stored at the lowest address, and is compared first. Since 0xEE is larger than 0xEC, you get a negative value returned — you cannot rely on it being `-2`, only on it being less than zero. If you were on a big-endian machine (e.g. SPARC), then you'd get different results. You'd also get a different result if you had `int test1 = 841088;`. – Jonathan Leffler Nov 27 '18 at 00:34
  • 3
    Effectively you are comparing the strings "\xEC\x53\x80\x00" and "\xEE\x3F\x80\x00", because of endianness. So yeah the first one is smaller than the second. – Havenard Nov 27 '18 at 00:36
  • 1
    Incidentally, you could avoid `t1` and `t2` and use: `int rc = memcmp(&test1, &test2, sizeof(test1));` — the first two arguments to `memcmp()` are of type `void *` and any object pointer (as opposed to function pointer) type converts automatically to `void *` without needing an explicit cast. – Jonathan Leffler Nov 27 '18 at 00:38
  • @Johnathan Leffler So this means memcmp can return different signs accross platforms for the same code? If I were on a big endian would memcmp return a positive signage? – knowads Nov 27 '18 at 00:39
  • Yes (different on different platforms) and yes (big-endian for example values would return a positive value for the shown test values). – Jonathan Leffler Nov 27 '18 at 00:39
  • Apologies. Is there a way I can "force" endianess for the piece of code. Like I Want it to start with comparing x00 and x00 first? – knowads Nov 27 '18 at 00:42
  • 1
    With `htonl()` I guess. This function converts whatever endinaness the system is using to "network" endinaness, which is big-endian and meet the criteria. Though it's rather silly to compare ints with `memcmp()` when the CPU can do it directly. – Havenard Nov 27 '18 at 00:44
  • 3
    No, not really. Why do you want do it? OK; yes, you can forcibly store the value in a big-endian format and then do the `memcmp()` on that, and you will get the answer you seem to seek, but why not just use `test1 < test2` (or whatever comparison operator you're interested in). – Jonathan Leffler Nov 27 '18 at 00:44
  • 5
    @knowads How about just *not using `memcmp`* ? If you want to compare numerical scalars, then do so; `memcmp` doesn't so why try to hammer in a nail with a screwdriver ? – WhozCraig Nov 27 '18 at 00:44
  • They are different. Thats all. – wildplasser Nov 27 '18 at 00:49
  • In the scenario that these aren't ints but just char * to the same set of chars would (<.=,>) comparitors work as well> – knowads Nov 27 '18 at 00:55
  • @knowads Yeah we realize you are performing the necessary steps to make use of `memcmp()` to compare those integers, but that's precisely the problem. `memcmp()` compares byte per byte and is not suitable to compare multi-byte scalar types because, among other reasons, the way they are structured in memory is architecture specific. – Havenard Nov 27 '18 at 00:59
  • 1
    @knowads You can "force endianess" by using a character array: `uint8_t arr[4] = { 0x00, 0x80, 0x53, 0xEC};`. This will guarantee MS byte first no matter system. But you cannot use the contents of this data as a 32 bit value on a Little Endian machine, without shuffling it around first. – Lundin Nov 27 '18 at 07:30

2 Answers2

6

As pointed out in the comments, memcmp() runs byte comparison. Here is a man quote

int memcmp(const void *s1, const void *s2, size_t n);

RETURN VALUE: The memcmp() function returns an integer less than, equal to, or greater than zero if the first n bytes of s1 is found, respectively, to be less than, to match, or be greater than the first n bytes of s2 For a nonzero return value, the sign is determined by the sign of the difference between the first pair of bytes (interpreted as unsigned char) that differ in s1 and s2. If n is zero, the return value is zero. http://man7.org/linux/man-pages/man3/memcmp.3.html

If the bytes are not the same, the sign of the difference depends on the target endianness.

One application of memcmp() is testing if two large arrays are the same, which could be faster than writing a loop that runs element by element comparison. Refer to this stack questions for more details. Why is memcmp so much faster than a for loop check?

HappyKeyboard
  • 145
  • 1
  • 7
4

memcmp compares memory. That is, it compares the bytes used to represent objects. The bytes used to represent objects may vary from one C implementation to another. Per C 2018 6.2.6 2:

Except for bit-fields, objects are composed of contiguous sequences of one or more bytes, the number, order, and encoding of which are either explicitly specified or implementation-defined.

To compare the values represented by objects, use the ordinary operators <, <=, >, >=, ==, and !=. Comparing the memory of objects with memcmp should be used for limited purposes, such as inserting objects into a tree that only needs to be able to store and retrieve items without caring about what their values mean.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312