2

It completely misses me how can printf("Hello") ever print Cello. It challenges my basic understanding of C. But from the top answer (by Carson Myers) for the following question on Stack Overflow, it seems it is possible. Can you please explain in simple terms how is it possible? Here's what the answer says:

Whenever you write a string in your source, that string is read only (otherwise you would be potentially changing the behavior of the executable--imagine if you wrote char *a = "hello"; and then changed a[0] to 'c'. Then somewhere else wrote printf("hello");. If you were allowed to change the first character of "hello", and your compiler only stored it once (it should), then printf("hello"); would output cello!)

Aforementioned question: Is it possible to modify a string of char in C?

Community
  • 1
  • 1
Meathead
  • 493
  • 2
  • 6
  • 15
  • 6
    You understand it wrong. The answer is basically saying: **if** it were possible to modify a string literal, `printf("Hello")` would output `Cello`, which is obviously not what people expect. **IF**. – Yu Hao Oct 24 '14 at 11:12
  • Have you tried it yourself? I never got cello out of almost 100 times I demonstrated and ran C applications!!! – ha9u63a7 Oct 24 '14 at 11:14
  • Did you get this issue? Or are you asking if it can happen? Because in the first case, that's... very strange, and in the second case, the answer you are quoting doesn't say it happens -- it's actually discussing the reason why it doesn't. – rsethc Oct 24 '14 at 11:51

4 Answers4

5

Reasons:

  1. Compilers usually store only one copy of identical string literals, so the string literal in char *a = "hello"; and in printf("hello") could be at a same memory location.

  2. The answer in your link assumes that the memory location for storing string literals are mutable, which is typically not in modern architectures. However this is true if there's no memory access protection, e.g. in some embedded architectures or a 80386 working in real mode.

  3. So when you modify the string referenced by a, the value for printf changes as well.

starrify
  • 14,307
  • 5
  • 33
  • 50
  • Please write **`const`** `char`, *please*... that's the very root of this particular evil right there: Being too lazy to correctly qualify string literals `const`... – DevSolar Oct 24 '14 at 11:13
  • *Compilers usually store only one copy of identical string literals*--That along with your third point nails it. I didn't know that. Quite helpful information. Thank you. – Meathead Oct 24 '14 at 11:18
  • 2
    @DevSolar Sorry but I'm not very clear why it matters that much. I think the cause is how the string literals are stored, not the qualifier of the pointer. – starrify Oct 24 '14 at 11:19
  • @DevSolar You have a point. I assigned a string literal to a character pointer during the pointer's declaration, and then I tried to modify it. It compiled without error but program won't work. So I would henceforth use 'const char' as you suggested, but can you tell me why my program compiled in first place? – Meathead Oct 24 '14 at 11:22
  • @starrify: You'd be surprised how much code broke when compilers started to actually *use* that particular optimization. People didn't use `const` in manuals and tutorials, other people didn't know or didn't bother and wrote to the array anyway, because "it works for me". Then the next compiler update broke the code... and that's not the compiler's fault, but it was a real effort to teach people that. ;-) Like const correctness in general, and many other coding practices, the idea is safety. The compiler is your friend, *use* its abilities to actually *tell* you when you're doing it wrong. – DevSolar Oct 24 '14 at 11:44
  • @DevSolar Thanks and I totally agree with your comments. I thought the OP's asking about something like "why XXX is possible" and I gave explanations. It's true that people shall use the qualified `const` when needed and I didn't suggest it in my answer. Thank you again for pointing this out. :) – starrify Oct 24 '14 at 11:51
4

This is a practical explanation (i.e., not dictated by the C-language standard):

First, you declare char *a = "hello" somewhere in your code.

As a result, the compiler:

  • Generates a constant string "hello" and places it in a read-only memory section within the executable image (typically within the RO data section), but only if it hasn't already done so
  • Replaces char *a = "hello" with char *a = the address of "hello" in memory

Then, you call printf("hello") somewhere else in your code.

As a result, the compiler:

  • Generates a constant string "hello" and places it in a read-only memory section within the executable image (typically within the RO data section), but only if it hasn't already done so
  • Replaces printf("hello") with printf(the address of "hello" in memory)

Now, theoretically (as explained by @Carson Myers), if you could change any of the characters in "hello", then it would affect the result of anything that refers to the data located at the address of that string in memory.

In practice, because the compiler places all constant strings in a read-only memory section, it is not feasible.

barak manos
  • 29,648
  • 10
  • 62
  • 114
4

If you, somewhere in your source, have the string literal "Hello", that ends up in your executable as part of the code / data segment. This should be considered read-only at all times, because compilers are at liberty to optimize multiple occurences of the same literal into a single entity. You would have multiple cases of "Hello" in your source, and multiple pointers pointing to them, but they could all be pointing to the same address.

ISO/IEC 9899 "Programming languages - C", chapter 6.4.5 "String literals", paragraph 6:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

Thus, any pointer to such a string literal is to be declared as a pointer to constant contents, to make this clear on the source level:

char const * a = "Hello";

Given this definition, a[0] = 'C'; is not a valid operation: You cannot change a const value, the compiler would issue an error.

However, in more than one ways it is possible to "trick" the language. For one, you could cast the pointer:

char const * a = "Hello";
char * b = (char *)a;
b[0] = 'C';

As the above snippet from the standard states, this -- while syntactically correct -- is semantically undefined behaviour. It might even work "correctly" on certain platforms (mostly for historical reasons), and actually print "Cello". It might break on others.

Consider what would happen if your executable is burned into a ROM chip, and executed from there...


I said "historical reasons". In the beginning, there was no const. That is why C defines the type of a string literal as char[] (no const).

Note that:

  • C++98 does define string literals as being const, but allows conversion to char *.
  • C++03 still allows the conversion but deprecates it.
  • C++11 no longer allows the conversion without a cast.
DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • Thank you :-). I didn't ask a dumb question after all. I was really confused even after much contemplation. – Meathead Oct 24 '14 at 11:25
  • I've also found the standard quote (in N1570 instead) and was about to update my answer. As you've posted I'd rather not bother it. XD – starrify Oct 24 '14 at 11:28
  • +1 for the only complete answer. A complete answer to this question must mention that modifying string literals in undefined behavior. Code which goes like `char* ptr = "string";` is always incorrect, because it never makes sense to have a non-const pointer to a string literal. – Lundin Oct 24 '14 at 11:54
  • @Lundin Yeah, the second block of code in the answer, related to undefined behavior, was indeed helpful (I had that question in mind, but had not asked about the same in the present question) – Meathead Oct 24 '14 at 12:22
1

the *a points to a different "Hello" than the one that you pass to printf. (you have 2 "hello" in your system)

It will work if you ask printf to print the string at a.

Dani
  • 14,639
  • 11
  • 62
  • 110
  • The above answers say there will be one copy of "hello" – Meathead Oct 24 '14 at 11:26
  • This is not necessarily true. It might be the very same "hello" if the compiler uses pooling of string literals, which is very common. – Lundin Oct 24 '14 at 11:52
  • @Lundin You addressed your comment to Dani right? You mean during pooling of string literals, only single copies of strings are used, right? – Meathead Oct 24 '14 at 12:23
  • @Meathead Yes and yes. – Lundin Oct 24 '14 at 12:31
  • It might be a specific compiler optimization. Yet as mentioned above even if the compiler optimize like that it will gaurd the intended behaviour with read only. You cant change it. – Dani Oct 25 '14 at 06:34