21

I know that in order to compare two strings in C, you need to use the strcmp() function. But I tried to compare two strings with the == operator, and it worked. I don't know how, because it just compares the address of the two strings. It shouldn't work if the strings are different. But then I printed the address of the strings:

#include <stdio.h>
#include <stdlib.h>

int main()
{
    char* str1 = "First";
    char* str2 = "Second";
    char* str3 = "First";

    printf("%p %p %p", str1, str2, str3);

    return 0;
}

And the output was:

00403024 0040302A 00403024
Process returned 0 (0x0)   execution time : 0.109 s
Press any key to continue.

How is it possible that str1 and str3 have the same address? They may contain the same string, but they aren't the same variable.

Christian Dean
  • 22,138
  • 7
  • 54
  • 87
Drakalex
  • 1,488
  • 3
  • 19
  • 39
  • 28
    compiler is smart enough to recognize `str1` and `str3` are the same string, so it just creates one instance of it in read-only memory and thus the pointer to that one instance is the same. – yano Mar 06 '18 at 16:38
  • 2
    Try `char* str4 = "irst";` I would not be surprised if its pointer is `00403025` with your optimization parameters. – Patrick Roberts Mar 06 '18 at 21:03
  • Also try `printf("%p %p %p\n", (void *)&str1, (void *)&str2, (void *)&str3)`. – zwol Mar 06 '18 at 22:02
  • @PatrickRoberts You're right: https://onlinegdb.com/Sks_ysndz – Mark H Mar 06 '18 at 23:08
  • duplicates: [Addresses of two pointers are same](https://stackoverflow.com/q/19088153/995714), [optimisation of string literals](https://stackoverflow.com/q/11399682/995714), [Same strings in array have same memory address](https://stackoverflow.com/q/26433563/995714), [Constant strings address](https://stackoverflow.com/q/1611673/995714)... – phuclv Mar 07 '18 at 04:12
  • You can't infer anything from doing things with no guaranteed result. – philipxy Mar 07 '18 at 06:45
  • @yano That's an answer, not a comment. – pipe Mar 07 '18 at 07:06
  • @yano `str1` and `str2` are not strings. They are pointers. Just saying this to clarify your comment, because this seems to be the source of OPs confusion... There's no problem what so ever for two pointers to point to same address. – hyde Mar 07 '18 at 07:20
  • @hyde Yes, should've said `"First"` and `"First"` are the same string, the compiler is smart enough to realize this, so it creates one instance of this string in (most likely read-only) memory and points `str1` and `str2` to that location. – yano Mar 07 '18 at 07:26
  • I'm interested (but not coding yet in C), does that mean that if you make some changes to `str1` or `str3` the other one will change too ? – Yassine Badache Mar 07 '18 at 07:47
  • @YassineBadache It is undefined behavior to modify a _string literal_, those should really be `const char* str1;`, etc. The difference is in declaration/initialization. `char* str1 = "hello";` (as above) is a string literal, so any attempts to modify what `str1` points to invoke undefined behavior. However, for `char[] str2 = "hello";`, you can safely modify `str2`, just realize that buffer only has 6 bytes total. Take a look [here](https://stackoverflow.com/questions/2589949/string-literals-where-do-they-go) and [here](https://stackoverflow.com/questions/5464183/modifying-string-literal) – yano Mar 07 '18 at 16:28
  • @YassineBadache But to directly answer your question, yes, if multiple pointers point to the same object, and that object is modified, then you will see the modified object no matter which pointer you dereference. That's perfectly legal and desirable in many instances.. just in this particular case, it's undefined behavior to modify the objects (the strings `"First"` and `"Second"`) that these pointers (`str1`, `str2`, `str3`) point to. – yano Mar 07 '18 at 16:35
  • and in my response to @hyde, I mean `str1` and `str3` – yano Mar 07 '18 at 16:36
  • Thank you, that's really interesting ! – Yassine Badache Mar 08 '18 at 08:04

7 Answers7

23

There is no guarantee that it will always be like this. In general, implementors maintain a literal pool maintaining each of the string literals only once, and then for multiple usages of the string literal the same address is being used. But one might implement it a different way - the standard does not pose a constraint on this.

Now your question: You are looking at the content of the two pointers pointing to the same string literal. The same string literal gave rise to the same value (they decayed into a pointer to the first element). But that address is same because of the reason stated in the first paragraph.

Also, I would emphasize providing the argument of the %p format specifier with the (void*) cast.

user2736738
  • 30,591
  • 5
  • 42
  • 56
15

There is an interesting point here. What you have actually are just 3 pointers all pointing to const litteral strings. So the compiler is free to create one single string for "First" and have both str1 and str3 point there.

This would be a completely different case:

char str1[] = "First";
char str2[] = "Second";
char str3[] = "First";

I have declared 3 different char arrays initialized from litteral strings. Test it, and you will see that the compiler have assigned different addresses for the 3 different strings.

What you should remember from that: pointers and arrays are different animals, even if arrays can decay to pointers (more on it in this post from the C FAQ)

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
10

When a particular string literal appears multiple times in a source file, the compiler may choose to have all instances of that literal point to the same place.

Section 6.4.5 of the C standard, which describes String Literals, states the following:

7 It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

Where "unspecified behavior" is defined in section 3.4.4 as:

use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance

In your case, the string literal "First" appears twice in the source. So the compiler uses the same instance of the literal for both, resulting in str1 and str3 pointing to the same instance.

As stated above, this behavior is not guaranteed. The two instances of "First" could be distinct from each other, resulting in str1 and str3 pointing to different places. Whether two identical instances of a string literal reside in the same place is unspecified.

dbush
  • 205,898
  • 23
  • 218
  • 273
3

String literals, just like C99+ compound literals, may be pooled. That means that two different occurrences in the source-code might in fact result in only one instance in the running program.
That might even be the case if your target does not support hardware write-protection.

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
2

The reason this is so perplexing might be, “But what happens if I set str1[1] = 'u';?” Since it’s implementation-defined whether str1 == str3 (and whether the address of the literal "world!" is the address of "hello, world!" plus 7), does that aldo turn str3 into a German prince?

The answer is: maybe. Or maybe it only changes str1, or maybe it silently fails to change either, or maybe it crashes the program because you wrote to read-only memory, or maybe it causes some other subtle bug because it re-used those bytes for yet another purpose, or something else entirely.

The fact that you can even assign a string literal to a char* at all, instead of needing to use const char*, is basically cruft for the sake of decades-old legacy code. The first versions of C did not have const. Some existing compilers let programs change string constants, and some didn’t. When the standards committee decided to add the const keyword from C++ to C, they weren’t willing to break all that code, so they gave compilers permission to do basically anything when a program changes a string literal.

The practical implication of this is: never assign a string literal to a char* that isn’t const. And never assume that string constants do or do not overlap (unless you guarantee this with restrict). That type of code has been obsolete since 1989, and just lets you shoot yourself in the foot. If you want a pointer to a string literal (which might or might not share memory with other constants), store it in a const char* or, better yet, const char* const. That warns you if you try to modify it. If you want an array of char that can be modified (and is guaranteed not to alias any other variable), store it in a char[].

If you think you want to compare strings by their addresses, what you really want is either a hash value or a unique handle.

Davislor
  • 14,674
  • 2
  • 34
  • 49
1

To add on to the other answers: this is a technique called string interning where the compiler realizes that the strings are the same and therefore only stores them once. Java tends to do this as well (though, as mentioned by the other poster, it's compiler-dependent).

wolfson
  • 192
  • 8
-2

It's because every hardcoded string like "First" and "Second" is present in the "read-only" part of the executable, hence they have an address.

On linux, you can see them by using "objdump -s -j .rodata execfile".

If you try to display str1, str2 and str3 address, you will see that there are different.

Tom's
  • 2,448
  • 10
  • 22
  • 1
    There might not be any read-only part of the executable, and the behavior could still be the same. – Deduplicator Mar 06 '18 at 16:39
  • Hum, yeah, that display the address of STRING, not the address of VARIABLE. So, like coderredoc say, it's implemented behavior. Usually, on Linux, there are on rodata segment and "duplicate" literal string are in fact the same. But again, it's implementation defined behavior, so it's not something you can count on, and the same code will display different result. – Tom's Mar 06 '18 at 16:41