20

I understand that if you have 'cat' (string1) and 'dog' (string2) in strcmp (this is a C question) then the return value of strcmp would be less than 0 (since 'cat' is lexically less than 'dog').

However, I am not sure what would happen with strcmp if this happened:

string1: 'dog'
string2: 'dog2'.

What would strcmp return? Less than zero, zero, or greater than? For context, I am trying to write a comparator function that compares strings and would like to account for strings starting with the same characters. One string may have an extension (such as '2' in 'dog2' in the example above).

EDIT: This is not a duplicate question. The question that this is allegedly similar to asks what the return type represents - I am saying what happens when the strings are identical up to a point but then one of them stops whilst the other continues.

Daniel Soutar
  • 827
  • 1
  • 10
  • 24
  • 4
    Why not simply try it? – Some programmer dude Apr 09 '16 at 15:36
  • 1
    Possible duplicate of [strcmp() return values in c](http://stackoverflow.com/questions/7656475/strcmp-return-values-in-c) – MicroVirus Apr 09 '16 at 15:39
  • 4
    Because I've found with C that things aren't always consistent. The sizes of types is a good example of this. – Daniel Soutar Apr 09 '16 at 15:40
  • 2
    C is very consistent, if you have inconsistent behavior then I'm sorry to say that it's probably you that misunderstood something or using/doing something wrong. Or do you mean like `sizeof(int)` that is dependent on the implementation? It's *still* consistent, as it works as specified in the formal C standard. Strings, and their behavior, is consistent between platforms and implementations, but I'll give that the character encoding can differ, but the behavior of strings and [character and string functions](http://en.cppreference.com/w/c/string/byte) is still consistent. – Some programmer dude Apr 09 '16 at 16:06

5 Answers5

13

It returns the difference at the octet that differs. In your example '\0' < '2' so something negative is returned.

hroptatyr
  • 4,702
  • 1
  • 35
  • 38
7

It is defined in the C standard as the difference between the first two non matching characters, but the implementation is wild. The only common point is that the return value is zero for equal strings, then respectively <0 or >0 for str1<str2 and str1>str2. From ISO/IEC 9899:201x, §7.23.4 Comparison functions:

The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.

But some implementations take care to return typical values as 0, 1 and -1. See i.e. the Apple implementation (http://opensource.apple.com//source/Libc/Libc-262/ppc/gen/strcmp.c):

int
strcmp(const char *s1, const char *s2)
{
    for ( ; *s1 == *s2; s1++, s2++)
    if (*s1 == '\0')
        return 0;
    return ((*(unsigned char *)s1 < *(unsigned char *)s2) ? -1 : +1);
}

EDIT: In the Android boot library for Donut-release (https://android.googlesource.com/platform/bootable/bootloader/legacy/+/donut-release/libc/strcmp.c) the function returns 0 if strings are equal and 1 for the other 2 cases, and are used only logical operations:

int strcmp(const char *a, const char *b)
{
    while(*a && *b) {
        if(*a++ != *b++) return 1;
    }
    if(*a || *b) return 1;
    return 0;
}
Frankie_C
  • 4,764
  • 1
  • 13
  • 30
2

C11 quotes

C11 N1570 standard draft

I think "dog" < "dog2" is guaranteed by the following quotes:

7.23.4 Comparison functions 1 The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.

So the chars are interpreted as numbers, and '\0' is guaranteed to be 0:

Then:

7.23.4.2 The strcmp function 2 The strcmp function compares the string pointed to by s1 to the string pointed to by s2.

says that, obviously, strings are compared, and:

7.1.1 Definitions of terms 1 A string is a contiguous sequence of characters terminated by and including the first null character.

says that the null is part of the string.

Finally:

5.2.1 Character sets 2 [...] A byte with all bits set to 0, called the null character, shall exist in the basic execution character set; it is used to terminate a character string.

so '\0' is equal to zero.

Since the interpretation is as unsigned char, and all chars are different, zero is the smallest possible number.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
1

From man strcmp:

The strcmp() and strncmp() functions return an integer less than, equal to, or greater than zero if s1 (or the first n bytes thereof) is found, respectively, to be less than, to match, or be greater than s2.

This would normally be implemented like @hroptatyr describes.

Lee Taylor
  • 7,761
  • 16
  • 33
  • 49
totoro
  • 2,469
  • 2
  • 19
  • 23
0

If you want to compare just the initial len characters of two strings, use strncmp instead of strcmp:

#include <string.h>
size_t len = 3;
int res = strncmp("dog", "dog2", len);

res will be 0 in this case.

stark
  • 12,615
  • 3
  • 33
  • 50