-1

String comparison is a staple of most languages it seems, they all have a function that resembles C's strcmp to some extent. Its return value is usually described as such:

The strcmp() and strncmp() functions return an integer less than, equal to, or greater than zero if s1 (or the first n bytes thereof) is found, respectively, to be less than, to match, or be greater than s2.

Pretty much all there is to take away from that is that if the result is 0 then the strings are equal (sharing identical contents) and if it's nonzero then no.

However, where does the nonzero int value come from if the strings are unequal? What does it mean? And what precisely does it mean for one string to be "greater than" or "less than" another, since they're not numeric values?

Thank you for your time, I've never quite seen an explanation to string comparison functions other than explaining that 0 implies equality and nonzero implies inequality.

Jake
  • 898
  • 2
  • 7
  • 19

2 Answers2

4

Think of the simplest possible C implementation of the function:

int strcmp(char *p1, char *p2)
{
    int diff;
    do
        diff = *p1 - *p2;
    while (*p1++ && *p2++ && diff);
    return diff;
}

The returned value happens to have the proper sign, but the value itself is just an artifact of the comparison process. That's why the value is left unspecified, to give implementors the widest possible latitude for an efficient implementation.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • Ugh, `do...while` without braces. :( – celticminstrel Jul 12 '15 at 03:54
  • @celticminstrel it's the first time I've ever done it, I tried it with a compiler just to make sure it actually worked. But since my point was about "the simplest possible" it seemed appropriate to leave them out. Might have made more sense if it was all compressed into a single line. – Mark Ransom Jul 12 '15 at 04:11
  • For a long time I had no idea it was allowed. Then I saw it in the Blades of Exile source code and was shocked. Though I tend to omit braces for other flow control constructs, with do...while I think it looks weird and ugly. I don't think it really simplifies anything, anyway. – celticminstrel Jul 12 '15 at 04:25
2

Strings are compared using lexicographical order, which at its simplest is what you would think of as “dictionary ordering”: the string apple is less than the string banana because the character a precedes the character b according to my English locale; but apple follows abacus because p follows b—you simply compare each character in turn.

strcmp doesn’t make any particular guarantees about its nonzero return values beyond the sign. Typically they’re simply -1, 0, and +1, but you can’t rely on this. The standard could just as well have specified strcmp to return a more specific enumeration:

enum Ordering {
  LT,
  EQ,
  GT
};

But many C standard library functions accept and return “magical” int values as a matter of historical accident.

Jon Purdy
  • 53,300
  • 8
  • 96
  • 166
  • 1
    It usually does provide a guarantee about the *sign* of the returned value, and that's important. What it doesn't guarantee is the magnitude of a non-zero returned value. – Daniel Jul 12 '15 at 02:47
  • @Daniel: Yes. I was not explicit enough. Updated. – Jon Purdy Jul 12 '15 at 02:50
  • So this is what I was thinking, the idea that the magnitude of the value might be different depending on what implementation one's using. – Jake Jul 12 '15 at 02:54