2

From the man page:

The strcmp() and strncmp() functions return an integer less than, equal to, or greater than zero if s1 (or the first n bytes thereof) is found, respectively, to be less than, to match, or be greater than s2.

Example code in C (prints -15 on my machine, swapping test1 and test2 inverts the value):

#include <stdio.h>
#include <string.h>

int main() {
    char* test1 = "hello";
    char* test2 = "world";
    printf("%d\n", strcmp(test1, test2));
}

I found this code (taken from this question) that relies on the values of strcmp being something other than -1, 0 and 1 (it uses the return value in qsort). To me, this is terrible style and depends on undocumented features.

I guess I have two, related questions:

  • Is there something in the C standard that defines what the return values are besides less than, greater than, or equal to zero? If not, what does the standard implementation do?
  • Is the return value consistent across the Linux, Windows and the BSDs?

Edit:

After leaving my computer for 5 minutes, I realized that there is in fact no error with the code in question. I struck out the parts that I figured out before reading the comments/answers, but I left them there to keep the comments relevant. I think this is still an interesting question and may cause hiccups for programmers used to other languages that always return -1, 0 or 1 (e.g. Python seems to do this, but it's not documented that way).

FWIW, I think that relying on something other than the documented behavior is bad style.

Community
  • 1
  • 1
beatgammit
  • 19,817
  • 19
  • 86
  • 129
  • 6
    You're misunderstanding Bentley's code -- there's nothing in there that assumes anything about `strcmp`'s return value other than its sign. – Fred Foo Nov 26 '12 at 19:49
  • 1
    "I found this code that relies on the values of strcmp being something other than -1, 0 and 1.". I don't see how that code *relies* on anything about the return values of `strcmp`. On the contrary, the code does not rely on any specific return values, which is actually very good style. So, what exactly do you mean? – AnT stands with Russia Nov 26 '12 at 19:50
  • 4
    The requirements for a `qsort()` comparator are that it returns "an integer less than, equal to, or greater than zero" which is _exactly_ what `strcmp()` does. There's no abuse of any sort going on. – Hasturkun Nov 26 '12 at 19:50
  • 1
    The -15 you're seeing is the result of subtracting ASCII char 119 (`w`) from ASCII char 104 (`h`). So `104 - 119 = -15`. I'd suspect that if you change `test1` to `jello` you'll get a result of `-16` and so on. The man page you quoted specifically says "less than"; it doesn't say anything about "-1". IOW, a value less than zero would be any negative number except zero, and a value greater than zero would be any positive number except zero. – Ken White Nov 26 '12 at 19:54
  • @larsmans - Yup, caught that about 5 minutes after posting... Must have been some bad cheerios in the morning... I feel like such a noob... – beatgammit Nov 26 '12 at 20:05

7 Answers7

7

Is there something in the C standard that defines what the return values are besides less than, greater than, or equal to zero?

No. The tightest constraint is that it should be zero, less than zero or more than zero, as specified in the documentation of this particular function.

If not, what does the standard implementation do?

There's no such thing as "the standard implementation". Even if there was, it would probably just

return zero, less than zero or more than zero;

:-)

Is the return value consistent across the Linux, Windows and the BSDs?

I can confirm that it's consistent across Linux and OS X as of 10.7.4 (specifically, it's -1, 0 or +1). I have no idea about Windows, but I bet Microsoft guys use -2 and +3 just to break code :P

Also, let me also point out that you have completely misunderstood what the code does.

I found this code (taken from this question) that relies on the values of strcmp being something other than -1, 0 and 1 (it uses the return value in qsort). To me, this is terrible style and depends on undocumented features.

No, it actually doesn't. The C standard library is designed with consistency and ease of use in mind. That is, what qsort() requires is that its comparator function returns a negative or a positive number or zero - exactly what strcmp() is guaranteed to do. So this is not "terrible style", it's perfectly standards-conformant code which does not depend upon undocumented features.

Andreas Grapentin
  • 5,499
  • 4
  • 39
  • 57
  • 2
    If someone relies on `strcmp` returning -2 or +3 then his/her code deserves to be broken. ;-) – netcoder Nov 26 '12 at 19:49
  • This answer is good, but @Omkant's answer answers both parts. +1 anyway because this answer explains my error very clearly. – beatgammit Nov 26 '12 at 20:20
5

In the C99 standard, §7.21.4.2 The strcmp function:

The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.

Emphasis added.

It means the standard doesn't guarantee about the -1, 0 or 1; it may vary according to operating systems.

The value you are getting is the difference between w and h which is 15.

In your case hello and world so 'h'-'w' = -15 < 0 and that's why strcmp returns -15.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Omkant
  • 9,018
  • 8
  • 39
  • 59
  • This technically answers my question. I'm still marginally interested in behavior across systems, but it's not really important. – beatgammit Nov 26 '12 at 20:19
4

• Is there something in the C standard that defines what the return values are besides less than, greater than, or equal to zero? If not, what does the standard implementation do?

No, as you mentioned yourself the man page says less than, equal to, or greater than zero and that's what the standard says as well.

• Is the return value consistent across the Linux, Windows and the BSDs?

No.

On Linux (OpenSuSE 12.1, kernel 3.1) with gcc, I get -15/15 depending on if test1 or test2 is first. On Windows 7 (VS 2010) I get -1/1.

Based on the loose definition of strcmp(), both are fine.


...that relies on the values of strcmp being something other than -1, 0 and 1 (it uses the return value in qsort).

An interesting side note for you... if you take a look at the qsort() man page, the example there is pretty much the same as the Bell code you posted using strcmp(). The reason being the comparator function that qsort() requires is actually a great fit for the return from strcmp():

The comparison function must return an integer less than, equal to, or greater than zero if the first argument is considered to be respectively less than, equal to, or greater than the second.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Mike
  • 47,263
  • 29
  • 113
  • 177
  • Interesting, I'm going to guess that most, if not all, unices return the difference between the characters. Interesting that Windows doesn't. – beatgammit Nov 26 '12 at 20:27
  • @tjameson - yeah, they actually list it out in the .asm file of `strcmp` on windows how it will work: `; The instructions below should place -1 in eax if src < dst, ; and 1 in eax if src > dst.` Don't know why they chose to do that instead of just return the subtracted value – Mike Nov 26 '12 at 20:31
1

In reality, the return value of strcmp is likely to be the difference between the values of the bytes at the first position that differed, simply because returning this difference is a lot more efficient than doing an additional conditional branch to convert it to -1 or 1. Unfortunately, some broken software has been known to assume the result fits in 8 bits, leading to serious vulnerabilities. In short, you should never use anything but the sign of the result.

For details on the issues, read the article I linked above:

https://communities.coverity.com/blogs/security/2012/07/19/more-defects-like-the-mysql-memcmp-vulnerability

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Beware: as of 2020-011-17 01:11:00 -07:00 (don't ask), (a) the communities.coverity.com links have a certificate from misc.synopsis.com (and Synopsis.com now seems to own Coverity) so Chrome gives you (valid) scare messages about an insecure connection, and (b) the site is 'down for maintenance'. That might be short or long term maintenance — it will require further testing. It may be necessary to find the information in the WayBack Machine. – Jonathan Leffler Jan 17 '20 at 08:12
1

In this page:

The strcmp() function compares the string pointed to by s1 to the string pointed to by s2. The sign of a non-zero return value is determined by the sign of the difference between the values of the first pair of bytes (both interpreted as type unsigned char) that differ in the strings being compared.

Here is an implementation of strcmp in FreeBSD.

#include <string.h>

/*
 * Compare strings.
 */
int
strcmp(s1, s2)
    register const char *s1, *s2;
{
    while (*s1 == *s2++)
        if (*s1++ == 0)
            return (0);
    return (*(const unsigned char *)s1 - *(const unsigned char *)(s2 - 1));
}
  • This looks like the behavior I'm seeing, and is probably the same as the default Linux implementation. – beatgammit Nov 26 '12 at 20:22
  • I don't think that there is a thing like "the default Linux implementation." `strcmp()` is part of the standard library of the compiler in use, not the operating system in use. However, most on the Linuxes use GCC in one or another version - that might differ. Nowadays Clang gains some share. And then there are several cross compilers, at least on my (Linux) system. – the busybee Jan 17 '20 at 08:20
0

From the manual page:

RETURN VALUE The strcmp() and strncmp() functions return an integer less than, equal to, or greater than zero if s1 (or the first n bytes thereof) is found, respectively, to be less than, to match, or be greater than s2.

It only specifies that it is greater or less than 0, doesn't say anything about specific values, those are implementation specific i suppose.

CONFORMING TO SVr4, 4.3BSD, C89, C99. This says in which standards it is included. The function must exist and behave as specified, but the specification doesn't say anything about the actual returned values, so you can't rely on them.

LtWorf
  • 7,286
  • 6
  • 31
  • 45
0

There's nothing in the C standard that talks about the value returned by strcmp() (that is, other than the sign of that value):

7.21.4.2 The strcmp function

Synopsis

#include <string.h>
int strcmp(const char *s1, const char *s2);

Description

The strcmp function compares the string pointed to by s1 to the string pointed to by s2.

Returns

The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.

It is therefore pretty clear that using anything other than the sign of the returned value is a poor practice.

Community
  • 1
  • 1
NPE
  • 486,780
  • 108
  • 951
  • 1,012