25

When I make a call to the C string compare function like this:

strcmp("time","time")

It returns 0, which implies that the strings are not equal.

Can anyone tell me why C implementations seem to do this? I would think it would return a non-zero value if equal. I am curious to the reasons I am seeing this behavior.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Xenu
  • 427
  • 1
  • 5
  • 10
  • Time != money directly - you have to employ a unit conversion first. – Adam Davis Feb 27 '09 at 16:28
  • 3
    @remeador: I can ask as many stupid joke questions as I wish. To those interested, the original question was: "Why does strcmp("time","money") return 0 instead of non-zero?". – Xenu Mar 06 '09 at 15:16
  • 2
    edit: return non-zero instead of 0. – Xenu Mar 06 '09 at 15:18

7 Answers7

36

strcmp returns a lexical difference (or should i call it "short-circuit serial byte comparator" ? :-) ) of the two strings you have given as parameters. 0 means that both strings are equal

A positive value means that s1 would be after s2 in a dictionary.

A negative value means that s1 would be before s2 in a dictionary.

Hence your non-zero value when comparing "time" and "money" which are obviously different, even though one would say that time is money ! :-)

Benoît
  • 16,798
  • 8
  • 46
  • 66
  • 3
    strcmp() does not do a lexical comparison, it merely compares the value of each character until there is a difference or both strings terminate. – Ferruccio Feb 27 '09 at 16:35
  • I could swear I've read docs that said strcmp() does a lexical comparison, but some quick googling suggests Ferruccio is correct... could it have changed at some point? – rmeador Feb 27 '09 at 16:38
  • 1
    Hmmm I'm pretty sure it's a lexical comparison ! http://www.cplusplus.com/reference/clibrary/cstring/strcmp.html – Benoît Feb 27 '09 at 16:52
  • 1
    I think it changes with the given definition of "lexical". I too have seen strcmp() referred to as being a 'lexical' comparator, though obviously it does not take different localizations, collations, etc. into account. Maybe it is better to call it a "short-circuit serial byte comparator." – aib Feb 27 '09 at 16:52
  • It simply uses each character's ASCII value for comparison which are ordered like: ABCDE...abcde. A lexical comparison would order them like: AaBbCcDdEe. – Ferruccio Feb 27 '09 at 17:00
  • @Ferruccio I updated my answer. But you'll admit that the idea behind the comparison is lexical ! :-) – Benoît Feb 27 '09 at 17:02
  • there is a strcoll. i think that one takes the current locale into account – Johannes Schaub - litb Feb 27 '09 at 17:11
  • @litb, right it does, but for some strange reasons it falls back to strcmp if the locale is set to 'C' or 'POSIX' and not to strcasecmp, which would be more appropriate. – quinmars Feb 28 '09 at 00:16
19

The nice thing about an implementation like this is you can say

if(strcmp(<stringA>, <stringB>) > 0)   // Implies stringA > stringB
if(strcmp(<stringA>, <stringB>) == 0)  // Implies stringA == stringB
if(strcmp(<stringA>, <stringB>) < 0)   // Implies stringA < stringB
if(strcmp(<stringA>, <stringB>) >= 0)  // Implies stringA >= stringB
if(strcmp(<stringA>, <stringB>) <= 0)  // Implies stringA <= stringB
if(strcmp(<stringA>, <stringB>) != 0)  // Implies stringA != stringB

Note how the comparison with 0 exactly matches the comparison in the implication.

Daniel LeCheminant
  • 50,583
  • 16
  • 120
  • 115
  • 7
    There's a wonderful/horrible macro from the comp.lang.c FAQ that implements almost exactly this string behavior. `#define StrTest(str1, op, str2) (strcmp(str1, str2) op 0)` With it, you would write `if(StrTest(stringA, ==, stringB))` and friends. I'm on the fence as to whether it's a horrible idea or a wonderful idea. – Chris Lutz Sep 02 '09 at 04:45
7

It's common to functions to return zero for the common - or one-of-a-kind - case and non-zero for special cases. Take the main function, which conventionally returns zero on success and some nonzero value for failure. The precise non-zero value indicates what went wrong. For example: out of memory, no access rights or something else.

In your case, if the string is equal, then there is no reason why it is equal other than that the strings contain the same characters. But if they are non-equal then either the first can be smaller, or the second can be smaller. Having it return 1 for equal, 0 for smaller and 2 for greater would be somehow strange i think.

You can also think about it in terms of subtraction:

return = s1 - s2

If s1 is "lexicographically" less, then it will give is a negative value.

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
4

Another reason strcmp() returns the codes it does is so that it can be used directly in the standard library function qsort(), allowing you to sort an array of strings:

#include <string.h> // for strcmp()
#include <stdlib.h> // for qsort()
#include <stdio.h>

int sort_func(const void *a, const void *b)
{
    const char **s1 = (const char **)a;
    const char **s2 = (const char **)b;
    return strcmp(*s1, *s2);
}

int main(int argc, char **argv)
{
    int i;
    printf("Pre-sort:\n");
    for(i = 1; i < argc; i++)
        printf("Argument %i is %s\n", i, argv[i]);
    qsort((void *)(argv + 1), argc - 1, sizeof(char *), sort_func);
    printf("Post-sort:\n");
    for(i = 1; i < argc; i++)
        printf("Argument %i is %s\n", i, argv[i]);
    return 0;
}

This little sample program sorts its arguments ASCIIbetically (what some would call lexically). Lookie:

$ gcc -o sort sort.c
$ ./sort hi there little fella
Pre-sort:
Argument 1 is hi
Argument 2 is there
Argument 3 is little
Argument 4 is fella
Post-sort:
Argument 1 is fella
Argument 2 is hi
Argument 3 is little
Argument 4 is there

If strcmp() returned 1 (true) for equal strings and 0 (false) for inequal ones, it would be impossible to use it to obtain the degree or direction of inequality (i.e. how different, and which one is bigger) between the two strings, thus making it impossible to use it as a sorting function.

I don't know how familiar you are with C. The above code uses some of C's most confusing concepts - pointer arithmetic, pointer recasting, and function pointers - so if you don't understand some of that code, don't worry, you'll get there in time. Until then, you'll have plenty of fun questions to ask on StackOverflow. ;)

Chris Lutz
  • 73,191
  • 16
  • 130
  • 183
3

You seem to want strcmp to work like a (hypothetical)

int isEqual(const char *, const char *)

To be sure that would be true to the "zero is false" interpretation of integer results, but it would complicate the logic of sorting because, having established that the two strings were not the same, you would still need to learn which came "earlier".

Moreover, I suspect that a common implementation looks like

int strcmp(const char *s1, const char *s2){
   const unsigned char *q1=s1, *q2=s2;
   while ((*q1 == *q2) && *q1){ 
      ++q1; ++q2; 
   };
   return (*q1 - *q2);
}

which is [edit: kinda] elegant in a K&R kind of way. The important point here (which is increasingly obscured by getting the code right (evidently I should have left well enough alone)) is the way the return statement:

   return (*q1 - *q2);

which gives the results of the comparison naturally in terms of the character values.

dmckee --- ex-moderator kitten
  • 98,632
  • 24
  • 142
  • 234
2

There's three possible results: string 1 comes before string 2, string 1 comes after string 2, string 1 is the same as string 2. It is important to keep these three results separate; one use of strcmp() is to sort strings. The question is how you want to assign values to these three outcomes, and how to keep things more or less consistent. You might also look at the parameters for qsort() and bsearch(), which require compare functions much like strcmp().

If you wanted a string equality function, it would return nonzero for equal strings and zero for non-equal strings, to go along with C's rules on true and false. This means that there would be no way of distinguishing whether string 1 came before or after string 2. There are multiple true values for an int, or any other C data type you care to name, but only one false.

Therefore, having a useful strcmp() that returned true for string equality would require a lot of changes to the rest of the language, which simply aren't going to happen.

David Thornley
  • 56,304
  • 9
  • 91
  • 158
0

I guess it is simply for symmetry: -1 if less, 0 if equal, 1 if more.

PhiLho
  • 40,535
  • 6
  • 96
  • 134