Understanding C-strings & string literals in C++

Question

I have a few questions I would like to ask about string literals and C-strings.

So if I have something like this:

char cstr[] = "c-string";

As I understand it, the string literal is created in memory with a terminating null byte, say for example starting at address 0xA0 and ending at 0xA9, and from there the address is returned and/or casted to type char [ ] which then points to the address.

It is then legal to perform this:

for (int i = 0; i < (sizeof(array)/sizeof(char)); ++i)
    cstr[i] = 97+i;

So in this sense, are string literals able to be modified as long as they are casted to the type char [ ] ?

But with regular pointers, I've come to understand that when they are pointed to a string literal in memory, they cannot modify the contents because most compilers mark that allocated memory as "Read-Only" in some lower bound address space for constants.

char * p = "const cstring";
*p = 'A'; // illegal memory write

I guess what I'm trying to understand is why aren't char * types allowed to point to string literals like arrays do and modify their constants? Why do the string literals not get casted into char *'s like they do to char [ ]'s? If I have the wrong idea here or am completely off, feel free to correct me.

`char * p = "const cstring";` should throw a compilation error, since `"const cstring"` is type `const char*` (specifically so that you don't use it like you're using it in your example) — tylerl, Oct 06 '11 at 03:44
@LightnessRacesinOrbit as of C++11 string literals are of type `const char[N]`, which is perdy much equivalent to `const char*`. You can argue the pedantic details, but that's only adding confusion, not clarity. — tylerl, Jan 27 '19 at 00:35
@tylerl On the contrary, those are completely different types, and pretending otherwise is how confusion is introduced. — Lightness Races in Orbit, Jan 27 '19 at 01:21

score 5 · Answer 1 · answered Oct 06 '11 at 03:41

5

The bit that you're missing is a little compiler magic where this:

char cstr[] = "c-string";

Actually executes like this:

char *cstr = alloca(strlen("c-string")+1);
memcpy(cstr,"c-string",strlen("c-string")+1);

You don't see that bit, but it's more or less what the code compiles to.

answered Oct 06 '11 at 03:41

tylerl

30,197
13
80
113

This is definitely what I have been missing! The answer I chose as the selected answer basically put this into words, but the code is even cleaner ;) so really cstr is a const char * to locally allocated memory on the stack (or possibly the heap) that is a copy and is modifiable, unlike the literal string's values. thanks so much for showing me this. – Bobby Barjasteh Oct 06 '11 at 03:48
It's worth pointing out that in this case `cstr` is *not* a `const char*` but rather a `char*`. A `const char*` is what you get with string literals. That means that you can't go modifying its contents. In contrast, a `char*` means the data is modifiable. Also, it's very definitely on the stack, not the heap, which is why you don't have to call `free()` on it, and why you can't `return` it to your caller. – tylerl Oct 06 '11 at 20:16

score 2 · Answer 2 · answered Oct 06 '11 at 03:36

2

char cstr[] = "something"; is declaring an automatic array initialized to the bytes 's', 'o', 'm', ...

char * cstr = "something";, on the other hand, is declaring a character pointer initialized to the address of the literal "something".

answered Oct 06 '11 at 03:36

Hot Licks

47,103
17
93
151

Thanks for the insight on this. I see now that arrays, being a standard type, are literally initialized in their constructor by string literals, but are only copies of the literal itself. – Bobby Barjasteh Oct 06 '11 at 03:52

score 1 · Answer 3 · edited Jan 20 '19 at 13:37

char cstr[] = "c-string";

This copies "c-string" into a char array on the stack. It is legal to write to this memory.

char * p = "const cstring";
*p = 'A'; // illegal memory write

Literal strings like "c-string" and "const cstring" live in the data segment of your binary. This area is read-only. Above p points to memory in this area and it is illegal to write to that location. Since C++11 this is enforced more strongly than before, in that you must make it const char* p instead.

Related question here.

score 1 · Accepted Answer · answered Oct 06 '11 at 03:38

In the first case you are creating an actual array of characters, whose size is determined by the size of the literal you are initializing it with (8+1 bytes). The cstr variable is allocated memory on the stack, and the contents of the string literal (which in the code is located somewhere else, possibly in a read-only part of the memory) is copied into this variable.

In the second case, the local variable p is allocated memory on the stack as well, but its contents will be the address of the string literal you are initializing it with.

Thus, since the string literal may be located in a read-only memory, it is in general not safe to try to change it via the p pointer (you may get along with, or you may not). On the other hand, you can do whatever with the cstr array, because that is your local copy that just happens to have been initialized from the literal.

(Just one note: the cstr variable is of a type array of char and in most of contexts this translates to pointer to the first element of that array. Exception to this may be e.g. the sizeof operator: this one computes the size of the whole array, not just a pointer to the first element.)

Ah, I see. So the difference is that one is really just a pointer to the RO memory whilst the other is a variable of array of char (or constant pointer in other words) to a local copy of that literal in memory plus the null byte, and thus access to the copy is RW. Thanks for clarifying — Bobby Barjasteh, Oct 06 '11 at 03:46
Didn't always used to be RO memory, BTW. Modifying a "constant" string was an old C programmer's trick, before they started putting the strings in a read-only section. — Hot Licks, Oct 06 '11 at 11:45

Understanding C-strings & string literals in C++

4 Answers4

Linked