33

I was playing around with strcmp when I noticed this, here is the code:

#include <string.h>
#include <stdio.h>

int main(){

    //passing strings directly
    printf("%d\n", strcmp("ahmad", "fatema"));

    //passing strings as pointers 
    char *a= "ahmad";
    char *b= "fatema";
    printf("%d\n",strcmp(a,b));

    return 0;

}

the output is:

-1
-5

shouldn't strcmp work the same? Why is it that I am given different value when I pass strings as "ahmad" or as char* a = "ahmad". When you pass values to a function they are allocated in its stack right?

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
Ahmad AL-wazzan
  • 281
  • 3
  • 7
  • I can't reproduce it: http://ideone.com/SJFI7V. Do you have `#include `? – Barmar Jan 03 '15 at 02:44
  • @Barmar if he didnt, it wouldn't compile. – Borgleader Jan 03 '15 at 02:44
  • 1
    Compiling it with `gcc -O0 -g3` it happens to me to. – Iharob Al Asimi Jan 03 '15 at 02:45
  • 1
    [Works well for me!](http://ideone.com/lWkO77) – πάντα ῥεῖ Jan 03 '15 at 02:45
  • @Borgleader Wouldn't it just use a default prototype for `strcmp()`? – Barmar Jan 03 '15 at 02:45
  • 2
    @Barmar I included `` in my case, and the OP is right. – Iharob Al Asimi Jan 03 '15 at 02:46
  • 11
    The behavior is correct. The return value is negative in both cases. What is the problem here? – davmac Jan 03 '15 at 02:47
  • using the following strings `"ahmad"` and `"xbahmad"` the result was worst `-1` and `-23`. – Iharob Al Asimi Jan 03 '15 at 02:47
  • @iharob Do you have a [coliru](http://coliru.stacked-crooked.com/) sample? – πάντα ῥεῖ Jan 03 '15 at 02:48
  • @davmac it gives the difference between character, now its different so its th problem – Hackaholic Jan 03 '15 at 02:48
  • 17
    @Hackaholic the two different return values have precisely the same meaning according to the definition of the function. – davmac Jan 03 '15 at 02:51
  • @davmac yeaa right, POSIX def say that :) – Hackaholic Jan 03 '15 at 02:55
  • The side effect that `strcmp(a, b) == strcmp("ahmad", "fatema")` is `0` is funky, though. – Wintermute Jan 03 '15 at 03:01
  • 3
    @Wintermute - Since the standard doesn't give any guarantees on the return value from `strcmp` other than that it is negative, zero, or positive, the only time you can truly rely on `strcmp(a,b) == strcmp(c,d)` being true is if both comparisons yield zero. – David Hammen Jan 03 '15 at 03:42
  • 1
    @iharob looks like runtime version calculates difference between ASCII characters (e.g. `f-a = 5`, `x-a` = 23) and returns if not 0. – mip Jan 03 '15 at 03:45
  • @David Hammen - So it appears, but it's not exactly POLA-compliant. You'd rather expect `strcmp` to be a pure function. Yes, I know, the standard has no concept of pure functions, and I'm not arguing that this behavior is not standard-compliant, but it is potentially quite surprising and funky. – Wintermute Jan 03 '15 at 03:48
  • @abligh that is not a duplicate as it does not deal with different results from seemingly the same strings. It is just about how `strcmp` works and is not even a really good question either. – Shafik Yaghmour Jan 03 '15 at 12:49
  • @BenVoigt given the criteria given [here](http://meta.stackoverflow.com/a/277129/1708801) wouldn't it make more sense to close that question as a duplicate of this one? This questions has more views and votes and the I feel like the answers below are equally complete but more distinct. Clearly I am biased but I always feel conflicted on the handling of duplicates or potential duplicates, so not really sure on this one. – Shafik Yaghmour Jan 04 '15 at 05:28

2 Answers2

48

TL:DR: Use gcc -fno-builtin-strcmp so strcmp() isn't treated as equivalent to __builtin_strcmp(). With optimization disabled, GCC will only be able to do constant-propagation within a single statement, not across statements. The actual library version subtracts the differing character; the compile-time eval probably normalizes the result to 1 / 0 / -1, which isn't required or guaranteed by ISO C.


You are most likely seeing the result of a compiler optimization. If we test the code using gcc on godbolt, with -O0 optimization level, we can see for the first case it does not call strcmp:

movl    $-1, %esi   #,
movl    $.LC0, %edi #,
movl    $0, %eax    #,
call    printf  #

Since your are using constants as arguments to strcmp the compiler is able for perform constant folding and call a compiler intrinsic at compile time and generate the -1 then, instead of having to call strcmp at run-time which is implemented in the standard library and will have a different implementation then a likely more simple compile time strcmp.

In the second case it does generate a call to strcmp:

call    strcmp  #
movl    %eax, %esi  # D.2047,
movl    $.LC0, %edi #,
movl    $0, %eax    #,
call    printf  #

This is consistent with the fact that gcc has a builtin for strcmp, which is what gcc will use during constant folding.

If we further test using -O1 optimization level or greater gcc is able to fold both cases and the result will be -1 for both cases:

movl    $-1, %esi   #,
movl    $.LC0, %edi #,
xorl    %eax, %eax  #
call    printf  #
movl    $-1, %esi   #,
movl    $.LC0, %edi #,
xorl    %eax, %eax  #
call    printf  #

With more optimizations options turned on the optimizer is able to determine that a and b point to constants known at compile time as well and can also compute the result of strcmp for this case as well during compile time.

We can confirm that gcc is using builtin function by building with the -fno-builtin flag and observing that a call to strcmp will be generated for all cases.

clang is slightly different in that it does not fold at all using -O0 but will fold at -O1 and above for both.

Note, that any negative result is an entirely conformant, we can see by going to the draft C99 standard section 7.21.4.2 The strcmp function which says (emphasis mine):

int strcmp(const char *s1, const char *s2);

The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.

technosurus points out that strcmp is specified to treat the strings as if they were composed of unsigned char, this is covered in C99 under 7.21.1 which says:

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • 1
    While this is interesting, it is not really important to the answer. Even if strcmp were called in both cases, it would be perfectly valid for it to return different values for the same input string (so long as the return values were both the same sign etc). – davmac Jan 03 '15 at 02:53
  • 11
    @davmac, seems like the OP wanted to know why the values are different (even though they are both negative) – asimes Jan 03 '15 at 02:56
  • 3
    @asimes my point is that this is only part of the explanation as to why they are different. Or, if you prefer, it is one explanation of why they could be different (remember that different compilers, platforms etc might give different results). But the question implies that the OP does not understand why they _can_ be different, and that it would be wrong to assume that the numerical result must always be the same for the same input strings. – davmac Jan 03 '15 at 03:01
  • 1
    @davmac It would be allowed for them to be different, but surprising and unlogic. Except if you have something like the "Bastard Compiler From Hell"™... – glglgl Jan 03 '15 at 10:50
  • I believe it also mentions that it should be an **unsigned comparison** such that an extended (negative) ascii value is greater than a standard ascii value – technosaurus Jan 03 '15 at 21:25
  • @glglgl it's surprising only if you make assumptions beyond what the spec says. If you use the method as it is intended (to return a value that you then immediately compare with 0), there's no issue. It's only if you start trying to do something else with the return value (eg print it out, as in OP's case) that the behavior becomes surprising. In other words: if you try and do something unusual, you might get surprising results; in which case, it's hardly fair to blame the compiler. – davmac Jan 04 '15 at 00:46
  • @davmac I think most people find this unusual because the result is seemingly inconsistent. Unless you spend a lot of time looking at what optimizers do most experienced developers would have to stop and think for a while to explain why the result of calling what appears to be the same function with the same values gives a different result. No one is suggesting this result is not to specifications nor that anyone should rely on it but it validly makes someone question their understanding which is great, not everyone knows how to dig into these things which is why SO is here. – Shafik Yaghmour Jan 04 '15 at 04:38
  • You can always pass `-fno-builtin-strcmp` if you need your {libc's} implementation behavior. – technosaurus Jan 04 '15 at 07:14
  • @technosaurus that is a good point, I actually used `-fno-builtin` to verify originally but I did not add that detail, I probably will later on when I get some more time. I recently wrote a self-answered question which deals with builtins and constant expressions so I have been thinking about similar stuff a lot recently, you can see it here: [Is it a conforming compiler extension to treat non-constexpr standard library functions as constexpr?](http://stackoverflow.com/q/27744079/1708801) – Shafik Yaghmour Jan 04 '15 at 13:47
  • @ShafikYaghmour I understand why it surprises people. Again, my point is that the cause of this surprise is not a misunderstanding of `strcmp` as such but a more general misunderstanding, a belief that everything in C is simple and straightforward and that it is safe to make assumptions about how things will be compiled and how they will work - but it isn't (as I'm sure you're aware). Don't get me wrong - I like your answer (especially as amended) and I don't think it's incorrect. – davmac Jan 05 '15 at 01:35
14

I think you believe that the value returned by strcmp should somehow depend on the input strings passed to it in a way that is not defined by the function specification. This isn't correct. See for instance the POSIX definition:

http://pubs.opengroup.org/onlinepubs/009695399/functions/strcmp.html

Upon completion, strcmp() shall return an integer greater than, equal to, or less than 0, if the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2, respectively.

This is exactly what you are seeing. The implementation does not need to make any guarantee about the exact return value - only that is less than zero, equal to zero, or greater than zero as appropriate.

davmac
  • 20,150
  • 1
  • 40
  • 68
  • 3
    so it returns a random negative number then? i think it should be anyway deterministic, and there should be an explanation for the observed bahvior. – Iharob Al Asimi Jan 03 '15 at 03:48
  • 8
    @iharob : It is not "random", in the second case, it is the result of 'f' - 'a', but that is itself the result of the specific `strcmp()` (though I doubt implementations differ in this respect). The first result is explained by Shafic's answer and is compiler (or compiler option) dependent. Either way, you cannot rely on any result other than that guaranteed by the function's standard specification. – Clifford Jan 03 '15 at 03:56
  • @iharob: returning a random negative number would indeed satisfy the spec, but that's not what's happening here. The "explanation for the observed behavior" is that the `strcmp` implementation(s) returned different values on different occasions - in other words, the observed behavior is dependent on the implementation of the function. We could poke into various reasons why an implementation might give different numerical results (see Shafik's answer) but I personally think the key takeaway is "don't make assumptions about behavior that aren't explicit in the spec". – davmac Jan 03 '15 at 05:27