Why is wrong to modify the contents of a pointer to a string litteral?

Question

If I write:

char *aPtr = "blue"; //would be better const char *aPtr = "blue"
aPtr[0]='A';

I have a warning. The code above can work but isn't standard, it has a undefined behavior because it's read-only memory with a pointer at string litteral. The question is: Why is it like this? with this code rather:

char a[]="blue";
char *aPtr=a;
aPtr[0]='A';

is ok. I want to understand under the hood what happens

Neither the question of which this was closed as a duplicate nor the one mentioned by @HappyCoder is really similar to this one. Both of them deal with what type a string literal has. This is asking **why** the string literal has that type. — Jerry Coffin, Sep 28 '15 at 11:27
Not just "would be better"; "would be legal". Y'know this has been asked a million times. — Lightness Races in Orbit, Sep 28 '15 at 11:31
http://stackoverflow.com/questions/32807364/char-char-and-const-char-stack-code-segment-and-compiler-behavior — DawidPi, Sep 28 '15 at 11:34
possible duplicate of [Why do compilers allow string literals not to be const?](http://stackoverflow.com/questions/3075049/why-do-compilers-allow-string-literals-not-to-be-const) — DawidPi, Sep 28 '15 at 11:36

score 1 · Answer 1 · edited May 23 '17 at 12:29

1

The first is a pointer to a read-only value created by the compiler and placed in a read-only section of the program. You cannot modify the characters at that address because they are read-only.

The second creates an array and copies each element from the initializer (see this answer for more details on that). You can modify the contents of the array, because it's a simple variable.

The first one works the way it does because doing anything else would require dynamically-allocating a new variable, and would require garbage collection to free it. That is not how C and C++ work.

edited May 23 '17 at 12:29

Community

1
1

answered Sep 28 '15 at 11:30

Jonathan Wakely

166,810
27
341
521

Ok, if I understood in the first case the compiler put the string litteral in the area constants,hence it isn't impossible modify the value. In the second case rather the array is on the stack. It is right? – Nick Sep 28 '15 at 11:37
Yes. In both cases the variable you declare is on the stack, but different types of variables. In the first case that variable is just the pointer `p`, which _points_ to a constant array stored elsewhere. In the second case the variable is the array `a` and that doesn't point anywhere, it is an array on the stack. You can modify `a[0]` because that is part of the variable. You cannot modify `p[0]` because it is a constant. – Jonathan Wakely Sep 28 '15 at 11:39

Jerry Coffin · Answer 2 · 2015-09-28T12:41:42.270

The primary reason that string literals can't be modified (without undefined behavior) is to support string literal merging.

Long ago, when memory was much tighter than today, compiler authors noticed that many programs had the same string literals repeated many times--especially things like mode strings being passed to fopen (e.g., f = fopen("filename", "r");) and simple format strings being passed to printf (e.g., printf("%d\n", a);).

To save memory, they'd avoid allocating separate memory for each instance of these strings. Instead, they'd allocate one piece of memory, and point all the pointers at it.

In a few cases, they got even trickier than that, to merge literals that were't even entirely identical. For example consider code like this:

printf("%s\t%d\n", a);
/* ... */
printf("%d\n", b);

In this case, the string literals aren't entirely identical, but the second one is identical part of the end of the first. In this case, they'd still allocate one piece of memory. One pointer would point to the beginning of the memory, and the other to the position of the %d in that same block of memory.

With a possibility (but no requirement for) string literal merging, it's essentially impossible to say what behavior you'll get when you modify a string literal. If string literals are merged, modifying one string literal might modify others that are identical, or end identically. If string literals are not merged, modifying one will have no effect on any other.

MMUs added another dimension: they allowed memory to be marked as read-only, so attempting to modify a string literal would result in a signal of some sort--but only if the system had an MMU (which was often optional at one time) and also depending on whether the compiler/linker decided to put the string literals in memory they'd marked constant or not.

Since they couldn't define what the behavior would be when you modified a string literal, they decided that modifying a string literal would produce undefined behavior.

The second case is entirely different. Here you've defined an array of char. It's clear that if you define two separate arrays, they're still separate, regardless of content, so modifying one can't possibly affect the other. The behavior is clear and always has been, so doing so gives defined behavior. The fact that the array in question might be initialized from a string literal doesn't change that.

Even without considering merging, if literals could be modified what should `char* foo() { return "foo"; } foo()[0] = 'b'; puts(foo());` do? Either you allow the return value of `foo()` to change according to what arbitrary callers do with it, or you need to allocate a new string every time it's called and then garbage collect it somehow. Neither is a good option. — Jonathan Wakely, Sep 28 '15 at 13:04
@JonathanWakely: while it's certainly possible to imagine/postulate (quite a few) *other* reasons, literal merging is fundamentally different when it comes to a discussion of why it was done: literal merging is one that was really discussed (quite heavily) during the original C standardization process that resulted in the original C89/90 standard. — Jerry Coffin, Sep 28 '15 at 15:10

Why is wrong to modify the contents of a pointer to a string litteral?

2 Answers2