4

I'm learning C, and am currently studying String Handling. From where I'm studying, strcmp() is defined as-

This is a function which compares two strings to find out whether they are same or different. The two strings are compared character by character until there is a mismatch or end of one of the strings is reached, whichever occurs first. If the two strings are identical, strcmp( ) returns a value zero. If they’re not, it returns the numeric difference between the ASCII values of the first non-matching pairs of characters.

There is a sample program given, which is what my question is about-

main( )
{
    char string1[ ] = "Jerry" ;
    char string2[ ] = "Ferry" ;
    int i, j, k ;
    i = strcmp ( string1, "Jerry" ) ;
    j = strcmp ( string1, string2 ) ;
    k = strcmp ( string1, "Jerry boy" ) ;
    printf ( "\n%d %d %d", i, j, k ) ;
}

I ran this program on Dev-C++ on my windows(64 bit) machine, and got this output- 0 1 -1

Now, the book gives the output as 0 4 -32, with this reasoning-

In the first call to strcmp( ), the two strings are identical—“Jerry” and “Jerry”—and the value returned by strcmp( ) is zero. In the second call, the first character of “Jerry” doesn't match with the first character of “Ferry” and the result is 4, which is the numeric difference between ASCII value of ‘J’ and ASCII value of ‘F’. In the third call to strcmp( ) “Jerry” doesn’t match with “Jerry boy”, because the null character at the end of “Jerry” doesn’t match the blank in “Jerry boy”. The value returned is -32, which is the value of null character minus the ASCII value of space, i.e., ‘\0’ minus ‘ ’, which is equal to -32.

To confirm what the book says, I added this code to my program, just to verify the ASCII difference between J and F:

printf("\n Ascii value of J is %d", 'J' );
printf("\n Ascii value of F is %d", 'F' );

and then I got this in the output accordingly-

 Ascii value of J is 74
 Ascii value of F is 70

This is according to what the book says, however, as you can see, I get different values of j and k, that is, when the strings don't match. I did look up for similar questions on SO, and got some of them, but could not come across a definite answer for the different output(when it returns 1 and -1), hence I decided to ask a new question.

This question here seems to be somewhat similar, and the question description contains the following information about strcmp()-

The strcmp() and strncmp() functions return an integer less than, equal to, or greater than zero if s1 (or the first n bytes thereof) is found, respectively, to be less than, to match, or be greater than s2

In one of the answers, I came across this link which documents the functions of strcmp(). It further says-

The strcmp() function shall compare the string pointed to by s1 to the string pointed to by s2.

The sign of a non-zero return value shall be determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char) that differ in the strings being compared.

RETURN VALUE

Upon completion, strcmp() shall return an integer greater than, equal to, or less than 0, if the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2, respectively.

So, after reading all this, I'm inclined to think that irrespective of the implementation/platform being used, the strcmp() function should be used only to consider the return value as being to one of three categories (0, positive and negative), instead of relying on the exact value being returned.

Am I correct in my understanding?

Community
  • 1
  • 1
Manish Giri
  • 3,562
  • 8
  • 45
  • 81
  • 2
    *after reading all this, I'm inclined to think that 0, 1 or -1 are the only possible outcomes the strcmp()* How do you come to that conclusion? Read again the paragraph above you quoted from POSIX, this is not what is specified. – ouah Jul 29 '14 at 12:22
  • From where did you get the "ascii code difference" definition? I've never seen anything other than "return 0 if equal, negative value if first string is less than the second and positive if first is greater than second". See: http://en.wikibooks.org/wiki/C_Programming/C_Reference/string.h/strcmp – Jan Spurny Jul 29 '14 at 12:26
  • The phrase “The sign of a non-zero return value shall be determined by the sign of the difference between …” means that the sign of the result is the same as the sign of the difference between …, not that the result has to be -1, 0 or 1. – Pascal Cuoq Jul 29 '14 at 12:26
  • The GNU C Library implementation will return the difference between the two characters https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=string/strcmp.c;h=8229d7c773b361a1587cac2cfc5d9b12ba29255a;hb=1f529f7d8456f09109a8e942581f89f10a901ed0 – phuclv Jul 29 '14 at 12:31
  • 3
    Your study book is making a wrong assumption here: "it returns the numeric difference between the ASCII values". You could file a bug report to its publisher. Refer to this particular thread for specifics :-) – Jongware Jul 29 '14 at 12:31
  • another implementation http://fossies.org/dox/glibc-2.19/strcmp_8c_source.html – phuclv Jul 29 '14 at 12:32
  • 1
    So, for all practical purposes, I should bother about the zero and/or the sign of the integer being returned from strcmp(), right? Like an implementation of a password checker, for instance. If zero returns, then passwords match, etc. And when two strings are being compared, the sign of the returned integer should decide which string is larger/greater than the other? – Manish Giri Jul 29 '14 at 12:54
  • @DarkKnight Yes, exactly. You should *never* rely on the exact values, though. – The Paramagnetic Croissant Jul 29 '14 at 13:07
  • The specification "it returns the numeric difference between the ASCII values of the first non-matching pairs of characters" is wrong. The characters don't need to be ASCII. And the C standard (C11 [§7.24.4.2 The `strcmp` function](http://port70.net/~nsz/c/c11/n1570.html#7.24.4.2)) says: _The `strcmp` function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by `s1` is greater than, equal to, or less than the string pointed to by `s2`._ . That does not specify that the return value is the difference between the two characters that differ. – Jonathan Leffler Jan 17 '20 at 08:22
  • And yes, that means your first source of information is wrong on a simple, standard C function — which bodes ill for the rest of the book. There's bin over in the corner; throw the book into it. – Jonathan Leffler Jan 17 '20 at 08:25

4 Answers4

4

Here is a simple implementation of strcmp() in C from libc from Apple:

int
strcmp(const char *s1, const char *s2)
{
    for ( ; *s1 == *s2; s1++, s2++)
        if (*s1 == '\0')
            return 0;
    return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);
}

FreeBSD's libc implementation:

int
strcmp(const char *s1, const char *s2)
{
    while (*s1 == *s2++)
        if (*s1++ == '\0')
            return (0);
    return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));
}

Here is the implementation from GNU libc, which returns the difference between characters:

int
strcmp (p1, p2)
     const char *p1;
     const char *p2;
{
  const unsigned char *s1 = (const unsigned char *) p1;
  const unsigned char *s2 = (const unsigned char *) p2;
  unsigned char c1, c2;

  do
    {
      c1 = (unsigned char) *s1++;
      c2 = (unsigned char) *s2++;
      if (c1 == '\0')
    return c1 - c2;
    }
  while (c1 == c2);

  return c1 - c2;
}

That's why most comparisons that I've read are written in < 0, == 0 and > 0 if it does not need to know the exact difference between the characters in string.

denisvm
  • 720
  • 3
  • 11
  • it's the same in Apple's open source lib http://opensource.apple.com/source/Libc/Libc-262/ppc/gen/strcmp.c – phuclv Jul 29 '14 at 12:26
  • I think that the `unsigned` above is wrong and both its occurrences should be removed. On some systems `char`-s are signed, on others they are not. – Basile Starynkevitch Jul 29 '14 at 12:36
  • Your last paragraph is misleading, as it seems to imply that all implementations will return -1, 0, or +1, which is incorrect. – interjay Jul 29 '14 at 12:43
  • 1
    @BasileStarynkevitch The standard says to interpret the characters as `unsigned char` when comparing strings. – interjay Jul 29 '14 at 12:45
  • Yes, sorry, my mistake! – Basile Starynkevitch Jul 29 '14 at 12:50
  • @interjay I've said that I've saw implementations returning those three values instead the difference. Anyways I've updated with GNU libc implementation which returns the difference between different characters. – denisvm Jul 29 '14 at 13:08
3

Upon completion, strcmp() shall return an integer greater than, equal to, or less than 0, if the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2, respectively.

And you write:

So, after reading all this, I'm inclined to think that 0, 1 or -1 are the only possible outcomes the strcmp() function.

Why? It's exactly that the actual value of the returned integer is not specified, only its sign.

  • I'm not sure I understand what you mean by this statement: _Why? It's exactly that the actual value of the returned integer is not specified, only its sign_ Could you please elaborate? And second, why am I getting the output which I got? – Manish Giri Jul 29 '14 at 12:25
  • @DarkKnight What do you not understand in that? You conclude the opposite of what the first paragraph says. It says that the return value must be positive, negative or 0. It does not say that it has to be -1 if negative or +1 if positive. – The Paramagnetic Croissant Jul 29 '14 at 12:26
  • Okay, I get what the paragraph says. I just don't get the logic behind my output. Why do I get +1 and -1 for the second and third cases? – Manish Giri Jul 29 '14 at 12:34
  • @DarkKnight I don't know the particular implementation detail. It's valid and standard-compliant, and that's all what matters. – The Paramagnetic Croissant Jul 29 '14 at 12:36
  • 1
    This "answer" doesn't answer the question; it only shows that OP doesn't know the answer. – anatolyg Jul 29 '14 at 12:40
  • @anatolyg what? how so? – The Paramagnetic Croissant Jul 29 '14 at 12:55
3

The C language specification is a document written in English.

The member of the standardization committee carefully choose their words to permit implementors to make their own implementation choices.

On some hardware (or implementation), returning any integers (respecting the constraints of the specification) could be faster (or simpler, or smaller code) than returning only -1, 0, 1 (like the function proposed in dvm's answer). FWIW, musl-libc's strcmp.c is shorter, and can return integers outside of -1, 0, 1; but it is conforming to the standard.

BTW, with GCC & GNU libc (e.g. on most Linux systems) the strcmp function may be handled -notably when optimizing- as a compiler builtin - __builtin_strcmp... It can then be sometimes replaced by some very efficient code.

Try compiling the following function (in a file abc.c)

#include <string.h>
int isabc(const char*s) { return strcmp(s, "abc"); }

with optimizations enabled and look at the assembly code. On my Debian/Sid/x86-64 with GCC 4.9.1, compiling with gcc -fverbose-asm -S -O2 abc.c I see no function calls at all in the produced abc.s (but that isabc may return other numbers than -1, 0, 1).

You should care about portable code, hence you should not expect a particular value (as long as your vendor's strcmp obeys its imprecise and fuzzy specification)

Read also about undefined behavior, it is a related idea: the language specification is voluntarily imprecise to permit various implementors to do different implementation choices

Community
  • 1
  • 1
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
1

0, 1, -1 are like standard values; however you should think about these like: zero, positive, negative.

In that case, the meanings are:

  • Zero (0) means that strings are equal.
  • Negative (-1 or any other) means that first string is less.
  • Positive (1 or any other) means that first string is more.
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
ST3
  • 8,826
  • 3
  • 68
  • 92