24

Any idea why I get "Maya is not Maya" as a result of this code?

if ("Maya" == "Maya") 
   printf("Maya is Maya \n");
else
   printf("Maya is not Maya \n");
metamorphosis
  • 1,972
  • 16
  • 25
raymond
  • 253
  • 1
  • 5

8 Answers8

48

Because you are actually comparing two pointers - use e.g. one of the following instead:

if (std::string("Maya") == "Maya") { /* ... */ } 
if (std::strcmp("Maya", "Maya") == 0) { /* ... */ }

This is because C++03, §2.13.4 says:

An ordinary string literal has type “array of n const char

... and in your case a conversion to pointer applies.

See also this question on why you can't provide an overload for == for this case.

Community
  • 1
  • 1
Georg Fritzsche
  • 97,545
  • 26
  • 194
  • 236
  • And, make sure you include . There's no built-in string type in C++. – Jim Lamb Jul 21 '10 at 19:46
  • 37
    C++ - the only language on SO where chapter, paragraph, and verse are routinely quoted. – Paul Nathan Jul 21 '10 at 19:57
  • 3
    @Paul: Heh, but then its also one of the few languages where not following the standard can make for some major headaches. – Georg Fritzsche Jul 21 '10 at 20:02
  • 6
    @Georg: doesn't *bother* me, I wish other languages communities had that sort of precision. But it does amuse me. – Paul Nathan Jul 21 '10 at 20:14
  • 2
    There aren't many other languages with multiple stable implementations - and even fewer where there isn't a single dominant implementation. I assume C questions quote the standard a lot as well? – JoeG Jul 21 '10 at 21:11
  • 1
    @Joe: Not that much - but then it also has far less intricacies. – Georg Fritzsche Jul 22 '10 at 03:10
  • 1
    it's also my impression that the C community is much more pragmatic ("if it works in the real world, who cares if it's UB") – jalf Jul 22 '10 at 10:45
18

You are not comparing strings, you are comparing pointer address equality.

To be more explicit -

"foo baz bar" implicitly defines an anonymous const char[m]. It is implementation-defined as to whether identical anonymous const char[m] will point to the same location in memory(a concept referred to as interning).

The function you want - in C - is strmp(char*, char*), which returns 0 on equality.

Or, in C++, what you might do is

#include <string>

std::string s1 = "foo"

std::string s2 = "bar"

and then compare s1 vs. s2 with the == operator, which is defined in an intuitive fashion for strings.

Paul Nathan
  • 39,638
  • 28
  • 112
  • 212
9

The output of your program is implementation-defined.

A string literal has the type const char[N] (that is, it's an array). Whether or not each string literal in your program is represented by a unique array is implementation-defined. (§2.13.4/2)

When you do the comparison, the arrays decay into pointers (to the first element), and you do a pointer comparison. If the compiler decides to store both string literals as the same array, the pointers compare true; if they each have their own storage, they compare false.

To compare string's, use std::strcmp(), like this:

if (std::strcmp("Maya", "Maya") == 0) // same

Typically you'd use the standard string class, std::string. It defines operator==. You'd need to make one of your literals a std::string to use that operator:

if (std::string("Maya") == "Maya") // same
GManNickG
  • 494,350
  • 52
  • 494
  • 543
8

What you are doing is comparing the address of one string with the address of another. Depending on the compiler and its settings, sometimes the identical literal strings will have the same address, and sometimes they won't (as apparently you found).

Gabe
  • 84,912
  • 12
  • 139
  • 238
6

Any idea why i get "Maya is not Maya" as a result

Because in C, and thus in C++, string literals are of type const char[], which is implicitly converted to const char*, a pointer to the first character, when you try to compare them. And pointer comparison is address comparison.
Whether the two string literals compare equal or not depends whether your compiler (using your current settings) pools string literals. It is allowed to do that, but it doesn't need to. .

To compare the strings in C, use strcmp() from the <string.h> header. (It's std::strcmp() from <cstring>in C++.)

To do so in C++, the easiest is to turn one of them into a std::string (from the <string> header), which comes with all comparison operators, including ==:

#include <string>

// ...

if (std::string("Maya") == "Maya") 
   std::cout << "Maya is Maya\n";
else
   std::cout << "Maya is not Maya\n";
sbi
  • 219,715
  • 46
  • 258
  • 445
  • 1
    Well, the literal is a `const char[N]`. (Note the `const`.) The conversion to a `char*` is allowed but deprecated. – GManNickG Jul 21 '10 at 19:55
  • @GMan: `` I think I was taught here, a few months ago, that a string literal is an rvalue of type `char[]`. You can us it to initialize a `char[]`. You can, however, only form `const` pointers to rvalues. Did I get this wrong? – sbi Jul 21 '10 at 20:03
  • The character array initialization is covered by a separate rule, *§8.5.2*. – Georg Fritzsche Jul 21 '10 at 20:08
  • @sbi: A literal is a const char*. However, it has special rules (much like NULL) that allow it to convert to char* (legacy pre-const code) and char[] (copies the contents). The literal itself, however, is const char*. It's easy to see this on any C++0x compiler, use auto, get const char*. – Puppy Jul 21 '10 at 20:13
  • 1
    @DeadMG: Nope, a literal has type `const char[N]` to which separate conversions to pointer and initialization rules apply. – Georg Fritzsche Jul 21 '10 at 20:15
  • @sbi: Like @Georg says, array initialization is a different rule. Literals are just the `"xxx"` part, and have static storage. (And the type previously mentioned.) – GManNickG Jul 21 '10 at 20:16
  • @GMan: Then why does automatic type deduction call them pointers? Surely they'd just be deduced to const char[]. – Puppy Jul 21 '10 at 20:33
  • @DeadMG: It depends on your context. For example: `auto x = "asd"` makes the type of `x` equal to `const char*`. However, `auto& x = "asd"` makes it `const char[4]`. – GManNickG Jul 21 '10 at 20:38
  • @GMan: Whether literals are constant objects or non-constant rvalues. – sbi Jul 22 '10 at 07:38
  • @sbi: Pretend that a string literal is a variable name. It's the name of some statically allocated array that contains the contents of that string. So `"abc"` is the "name" of a variable defined exactly like this: `static const char "abc"[4] = {'a', 'b', 'c', '\0'};`. (Note, I treat "abc" as the variable name.) – GManNickG Jul 22 '10 at 07:53
  • @GMan: And then what is `42`? A `const int`? – sbi Jul 22 '10 at 09:13
  • @sbi: Yes, though it's just the value and not stored anywhere, if I'm reading correctly. (Though again it's probably safe to think of `42` has the name of some variable: `static const int 42 = 42`. As you can see, that way is a bit more redundant, and doesn't make too much sense in this regard. This is because the variable is assigned the value of the literal 42, which we're trying to define in the first place.) – GManNickG Jul 22 '10 at 15:46
  • @GMan: `` I think it was here, and it was just a few months ago, that I was explained that literals are rvalues, not const objects. ICBWT, I have an old head and it's already overflowing. And, of course, I can't find this now... – sbi Jul 22 '10 at 17:28
  • @sbi: Heh, well this is only my understanding, I could be wrong. If you find that explanation I'd like to read it. :) – GManNickG Jul 22 '10 at 17:42
1

My compiler says they are the same ;-)

even worse, my compiler is certainly broken. This very basic equation:

printf("23 - 523 = %d\n","23"-"523");

produces:

23 - 523 = 1
mvds
  • 45,755
  • 8
  • 102
  • 111
  • 4
    @KevinDTimm: No, that means his compiler is using a merged constant string pool. This is often a compiler option (see for example `-fmerge-constants` in gcc). – Greg Hewgill Jul 21 '10 at 19:49
  • Your compiler recognizes them statically and reuses the same storage location. When the comparison is done, the addresses are the same. – Kirk Kelsey Jul 21 '10 at 19:49
  • 1
    @Kevin: No, then his compiler is fine. It's implementation defined whether or not string literals must be stored distinctly. – GManNickG Jul 21 '10 at 19:49
  • yes it must be ;-) The most surprising thing to me is that the OP's compiler doesn't see the strings are identical! – mvds Jul 21 '10 at 19:50
  • @GMan do you know what the logic is behind distinct storage? – mvds Jul 21 '10 at 19:52
  • @GMan you apparently cannot rely on identical strings in C being stored separately (given the differences we see here), so then it wouldn't make sense to make a compiler/linker which doesn't optimize such basic things out. – mvds Jul 21 '10 at 20:00
  • @mvds: I'm sure there might be reasons, who knows. :) – GManNickG Jul 21 '10 at 20:04
  • @GMan too bad you sounded like you knew ;-) -fmerge-constants does have a function in gcc, you need it to make the arithmetic above "work" – mvds Jul 21 '10 at 20:09
  • (I mean without -fmerge-constants the "maya"=="maya" works already) – mvds Jul 21 '10 at 20:10
1

C and C++ do this comparison via pointer comparison; looks like your compiler is creating separate resource instances for the strings "Maya" and "Maya" (probably due to having an optimization turned off).

Paul Sonier
  • 38,903
  • 3
  • 77
  • 117
1

Indeed, "because your compiler, in this instance, isn't using string pooling," is the technically correct, yet not particularly helpful answer :)

This is one of the many reasons the std::string class in the Standard Template Library now exists to replace this earlier kind of string when you want to do anything useful with strings in C++, and is a problem pretty much everyone who's ever learned C or C++ stumbles over fairly early on in their studies.

Let me explain.

Basically, back in the days of C, all strings worked like this. A string is just a bunch of characters in memory. A string you embed in your C source code gets translated into a bunch of bytes representing that string in the running machine code when your program executes.

The crucial part here is that a good old-fashioned C-style "string" is an array of characters in memory. That block of memory is often referred to by means of a pointer -- the address of the start of the block of memory. Generally, when you're referring to a "string" in C, you're referring to that block of memory, or a pointer to it. C doesn't have a string type per se; strings are just a bunch of chars in a row.

When you write this in your code:

"wibble"

Then the compiler provides a block of memory that contains the bytes representing the characters 'w', 'i', 'b', 'b', 'l', 'e', and '\0' in that order (the compiler adds a zero byte at the end, a "null terminator". In C a standard string is a null-terminated string: a block of characters starting at a given memory address and continuing until the next zero byte.)

And when you start comparing expressions like that, what happens is this:

if ("Maya" == "Maya")

At the point of this comparison, the compiler -- in your case, specifically; see my explanation of string pooling at the end -- has created two separate blocks of memory, to hold two different sets of characters that are both set to 'M', 'a', 'y', 'a', '\0'.

When the compiler sees a string in quotes like this, "under the hood" it builds an array of characters, and the string itself, "Maya", acts as the name of the array of characters. Because the names of arrays are effectively pointers, pointing at the first character of the array, the type of the expression "Maya" is pointer to char.

When you compare these two expressions using "==", what you're actually comparing is the pointers, the memory addresses of the beginning of these two different blocks of memory. Which is why the comparison is false, in your particular case, with your particular compiler.

If you want to compare two good old-fashioned C strings, you should use the strcmp() function. This will examine the contents of the memory pointed two by both "strings" (which, as I've explained, are just pointers to a block of memory) and go through the bytes, comparing them one-by-one, and tell you whether they're really the same.

Now, as I've said, this is the kind of slightly surprising result that's been biting C beginners on the arse since the days of yore. And that's one of the reasons the language evolved over time. Now, in C++, there is a std::string class, that will hold strings, and will work as you expect. The "==" operator for std::string will actually compare the contents of two std::strings.

By default, though, C++ is designed to be backwards-compatible with C, i.e. a C program will generally compile and work under a C++ compiler the same way it does in a C compiler, and that means that old-fashioned strings, "things like this in your code", will still end up as pointers to bits of memory that will give non-obvious results to the beginner when you start comparing them.

Oh, and that "string pooling" I mentioned at the beginning? That's where some more complexity might creep in. A smart compiler, to be efficient with its memory, may well spot that in your case, the strings are the same and can't be changed, and therefore only allocate one block of memory, with both of your names, "Maya", pointing at it. At which point, comparing the "strings" -- the pointers -- will tell you that they are, in fact, equal. But more by luck than design!

This "string pooling" behaviour will change from compiler to compiler, and often will differ between debug and release modes of the same compiler, as the release mode often includes optimisations like this, which will make the output code more compact (it only has to have one block of memory with "Maya" in, not two, so it's saved five -- remember that null terminator! -- bytes in the object code.) And that's the kind of behaviour that can drive a person insane if they don't know what's going on :)

If nothing else, this answer might give you a lot of search terms for the thousands of articles that are out there on the web already, trying to explain this. It's a bit painful, and everyone goes through it. If you can get your head around pointers, you'll be a much better C or C++ programmer in the long run, whether you choose to use std::string instead or not!

Matt Gibson
  • 37,886
  • 9
  • 99
  • 128
  • 2
    "C-style "string" is a pointer" is not correct. A string literal has the type `const char[N]`. Arrays can decay, so it can be safely implicitly converted to `const char*`, and string literals have a special, unsafe, and deprecated conversion to `char*`. – GManNickG Jul 21 '10 at 20:43
  • @GMan I apologise, it's possible I'm trying to aim my answer at too many levels at once, and oversimplifying here and there. I'll try to incorporate your point. – Matt Gibson Jul 22 '10 at 08:42
  • @GMan I've tried to revise my answer to be more precise about what's actually going on, while still not baffling the average beginner C/C++ programmer. I've also marked it as Community Wiki -- if you think you can dive in and improve the accuracy or the clarity, please do! I think it's important to have as full an explanation as possible here, as this is often a confusing point for new C programmers, and I've not really found a complete and clear explanation anywhere else to point at. – Matt Gibson Jul 22 '10 at 09:32