1

I found an implementation of the function strcmp I showed it to a friend and he said the following "It's worth noting that it doesn't always return the difference between the two differing characters; it is actually permitted to return any integer provided the sign is the same as the difference between the bytes." then gave me no further explanation, the code is this

int
strcmp(s1, s2)
    register const char *s1, *s2;
{
    while (*s1 == *s2++)
        if (*s1++ == 0)
            return (0);
    return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));
}

Can someone explain what is the error? and what kind of string can cause failure?

Kevin
  • 1,151
  • 1
  • 10
  • 18
  • 5
    What in what your friend said led you to believe there is an error in that code? – Sneftel Mar 15 '15 at 17:12
  • 1
    It is not an error, it's merely "worth noting". See [a description of the result](http://en.cppreference.com/w/c/string/byte/strcmp), which also uses the word *sign*. (Worth noting: that page mentions *unsiged* comparisons.) – Jongware Mar 15 '15 at 17:12
  • 4
    What I find disturbing here is the use of the ancient and deprecated way of declaring function parameters. – glglgl Mar 15 '15 at 17:13
  • 2
    One of your friends said ["it just returns the difference" but the aother friend said it's not necessarily true](http://stackoverflow.com/a/12136367/1275169). – P.P Mar 15 '15 at 17:15
  • 1
    @glglgl: agree — and what's even more intriguing is the presence of `const` (a feature of standard C) in a declaration using the pre-standard (K&R) function definition/declaration notation. – Jonathan Leffler Mar 15 '15 at 17:19
  • 1
    In fact implementations can [vary based on optimization level](http://stackoverflow.com/a/27751263/1708801) – Shafik Yaghmour Mar 15 '15 at 17:31
  • @Jonathan Leffler C89 introduced `const` and also allowed this style of function declaration. It just looks like coding style that was popular in the early 90s that attempted to work with standard and pre-standard compilers. – chux - Reinstate Monica Mar 15 '15 at 20:09
  • 2
    @chux: I am not disputing it's legality. It just seems eccentric to add a feature (`const`) to the old style definition. – Jonathan Leffler Mar 15 '15 at 20:11
  • @Jonathan Leffler How would you have coded it to a C89/pre-standard dual compilation? By including `const`, code prevents the function body from a coding error that changed `*s1`, etc. For the pre-C89 compilation, `const` would simple have been eliminated via `#define const`. Not so eccentric, yet certainly old style. – chux - Reinstate Monica Mar 15 '15 at 20:20

3 Answers3

5

What your friend means is: strcmp returns an integer that is greater than, equal to, or less than 0. It's not mandatory to return the actual difference between the two characters. However, it's not an error doing so.

The major problem of this implementations is: it uses K&R C, which is the pre-standard C used in the first edition of The C Programming Language by Brian Kernighan and Dennis Ritchie. You should always use standard C instead.

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
1

This strcmp does not necessarily return the real difference between the two strings. It either returns a positive integer or a negative integer or zero.

This confusion had caused major security vulnerabilities in programs like MySQL.

"The problem is that the value returned from these comparison functions is sometimes misunderstood by developers, so they make mistakes like thinking these functions can return only -1, 0, or 1. Or, they might think the return value can be safely cast to a smaller type such as char, but they don't realize that the truncation of the value might result in two memory regions being considered equal when they aren't." [1]

Have a look at this patch from Wine's repo:

+    ret = strcmp(file1, file2);
+    if (ret < 0) return -1;
+    if (ret > 0) return  1;
+    return  0;

References: [1]

Arjun Sreedharan
  • 11,003
  • 2
  • 26
  • 34
  • The specification of `strcmp` says: _The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by `s1` is greater than, equal to, or less than the string pointed to be `s2`_ so any positive or negative value is permissible. Code assuming that the range will be {-1, 0, +1} is unforgivably (but not irremediably) broken. Code assuming that the return value will be a measure of the distance between the two character codes that differ is also unforgivably (but not irremediably) broken. You cannot assume either behaviour: both are valid. – Jonathan Leffler Mar 15 '15 at 17:25
  • 1
    The patch, therefore, is applying a band-aid to unforgivably broken code. The people who wrote the code assuming `strcmp()` returns {-1, 0, +1} were simply unaware of the way the tools they use are defined and designed to work. I'm not saying the patch was not needed in the context — it may have been the quickest, and perhaps even best, fix. But the original code was badly written or it could not have run into problems. – Jonathan Leffler Mar 15 '15 at 17:27
  • @Jonathan Leffler as would be the correct way to write this function – Kevin Mar 15 '15 at 17:31
0

Here is the apple implementation of strcmp()

int strcmp(const char *s1, const char *s2)
{
    for ( ; *s1 == *s2; s1++, s2++)
        if (*s1 == '\0')
            return 0;
    return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);
}

here is a version from jbox

int strcmp(const char *s1, const char *s2)
{
    int ret = 0;

    while (!(ret = *(unsigned char *) s1 - *(unsigned char *) s2)
           && *s2) ++s1, ++s2;

   if (ret < 0) ret = -1;
   else if (ret > 0) ret = 1 ;

   return ret;

}

here is a wiki implementation

int strcmp(const char* s1, const char* s2)
{
    while(*s1 && (*s1==*s2))
        s1++,s2++;
    return *(const unsigned char*)s1-*(const unsigned char*)s2;
}

here is a charsharp.com implementation

int strcmp_ptr(char *src1, char *src2)
{
    int i=0;
    while((*src1!='\0') || (*src2!='\0'))
    {
        if(*src1 > *src2)
            return 1;
        if(*src1 < *src2)
            return -1;
        src1++;
        src2++;
    }
    return 0;
}

Notice they all work and they all meet the requirements as stated in the linux man page for strcmp. Here is what the man page says about the returned value:

"The strcmp() and strncmp() functions return an integer less than, equal to, or greater than zero if s1 (or the first n bytes thereof) is found, respectively, to be less than, to match, or be greater than s2."

Swordfish
  • 12,971
  • 3
  • 21
  • 43
user3629249
  • 16,402
  • 1
  • 16
  • 17