1

I had an embarrassing struggle with this simple thing, as this was segfaulting:

#include <stdio.h>

int main()
{
    char *test = "this is a string";
    test[0] = 'q';
    printf(test);
    return 0;
}

but this was not:

#include <stdio.h>

int main()
{
    char test[] = "this is a string";
    test[0] = 'q';
    printf(test);
    return 0;
}

After looking at the assembly I noticed that in the first case, the literal "this is a string" was declared in .rodata, so that explains the segfault. But in the second case, the string wasn't in the assembly at all, so I assume it was being linked via the .data section as writable. Why this difference in behavior? Is this obvious and I'm being stupid?

Sam
  • 41
  • 4
  • 3
    In the second example, the array is on the *stack* like any other local variable. – Some programmer dude Jun 01 '23 at 06:02
  • 2
    As for the first example, any decent beginners book or tutorial should have taught you that literal strings are really read-only arrays of characters, including the terminator. Exactly where they are stored doesn't matter, only that you're not allowed to modify them (therefore making them *read only*). By defining a pointer like you do in the first example, you make it point to the first element of that array. The read-only aspect of literal strings is why you should always use `const char *` for pointers to them. – Some programmer dude Jun 01 '23 at 06:04
  • 1
    A slight clarification about my first comment: The C specification doesn't mandate (or even mention) any specific location for local variables. It's an implementation detail. But putting automatic variables on the stack makes the compilers handling of them much easier, so that's what all normal compilers do for normal PC-like systems. – Some programmer dude Jun 01 '23 at 06:06
  • 1
    Also note that a definition like `char test[] = "foo";` is really the same as `char test[] = { 'f', 'o', 'o', '\0' };`. Hopefully it helps you see the difference from `char *test = "foo";` – Some programmer dude Jun 01 '23 at 06:08
  • 2
    In `char arr[]`, the quoted string is just an array initializer. In `char *ptr`, the quoted string is an actual string literal object. As Someprogrammerdude says, like an anonymous `const char[]` that exists somewhere (in practice in `.rodata`) but you just get its address. – Peter Cordes Jun 01 '23 at 06:10
  • 1
    And lastly, an important remainder: Arrays are not pointers, and pointers are not arrays. An array can *decay* to a pointer to its first element, but isn't a pointer in itself. And a pointer is just a pointer to a single object really. – Some programmer dude Jun 01 '23 at 06:13
  • 1
    What was the optimization level of the assembly? With `-O0` it's clearly [on the stack](https://godbolt.org/z/MxTYcqvj6). That first big number `2338328219631577204` is `0x2073692073696874`, or converted to ASCII is `" si siht"`, and finally flipped around to account for little endian is `"this is "` the first 8 bytes of the string (quotes added by me in both cases). For optimizations [jacked up](https://godbolt.org/z/c3TzP9nPW) it looks like the string is indeed read in from somewhere else. – yano Jun 01 '23 at 06:29
  • Why would you expect the same behaviour? `*` and `[]` are different things – M.M Jun 01 '23 at 06:48
  • I linked two duplicates: one regarding the `char*` vs `char[]` FAQ, one regarding where string literals are stored in memory. – Lundin Jun 01 '23 at 10:39
  • 1
    Peter's comment goes a point: in char str[] = "hello"; the string *looks* like a string literal but isn't — it is an array initializer expression. We could have written `char str[] = { 'h', 'e', 'l', 'l', 'o', '\0' };` and it would be more obvious that the thing following `=` here is an array initializer and there is no string literal, though for friendliness, the language allows the array initializer to be a string. The compiler has the choice to use `memmove` or equivalent from an internally generated string literal, or choose to assign each character individually or combination of both. – Erik Eidt Jun 01 '23 at 13:08
  • Thanks all, seems like the major distinction is array initializer vs. *actual* string literal. Seems I can't mark a comment as an accepted answer. – Sam Jun 01 '23 at 18:25

1 Answers1

5

Is this obvious? It should be because it is an important characteristic of C language: even if for historic reasons it is not declared as const, a string literal is not mutable. Let us look carefully:

char *test = "this is a string";

This (incorrectly) declares a non const pointer to a const character array. From that point, changing a character through that pointer explicitely invokes Undefined Behaviour. Here you got a segfault, but a compiler is free to ignore the attempt, exit the program without any message, or even <type your worst nightmare here>...

A decent compiler should have warned you of that (warnings are not to be ignored...) and the correct syntax is const char *test = "this is a string";.


char test[] = "this is a string";

This (correctly) declares a character array, and initializes its content by a copy of the string literal. The language is kind enough to allow you to give an empty size if you provide an initializer literal string and use the size of the initializer for the size of the array. From that point changing a character of the array is a legal operation.

What you should remember:

  1. string literals are const
  2. arrays and pointers are different animals. An array holds some data while a pointer just point to some data existing elsewhere. Simply an array will decay to a pointer when you use it as a value (the correct wording is rvalue here...).
Andreas Wenzel
  • 22,760
  • 4
  • 24
  • 39
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • 4
    String literals are NOT `const` in C. The compiler is not required to warn (and in fact it is non-conforming if the compiler refuses to compile the program). `const` did not even exist in the language when string literals were added, and their type has never been changed . See C11 6.4.5/6 "the array elements have type `char`" – M.M Jun 01 '23 at 06:49
  • 1
    @MM: Hope it is better now... – Serge Ballesta Jun 01 '23 at 07:07
  • the thing that is confusing is that in many places `char *` and `char []` mean the same thing, ie in function arg declarations – pm100 Jun 01 '23 at 07:13
  • @SergeBallesta Yep good job – M.M Jun 01 '23 at 08:20
  • From other comments - it seems like this isn't quite right? `char test[] = "test"` is an array initializer, somehow distinct from a string literal, which might explain why the latter was in .rodata (hence not mutable, as you said) and the former wasn't and ended up being mutable in practice - because of course it would be, it's an array, and you can always do elemental assignments on arrays. And for what it's worth, I'm not convinced that it was supposed to be obvious, at least from the perspective of a non-expert. – Sam Jun 01 '23 at 18:24
  • @Sam: What I meant by *it should be obvious* is not that is a trivial concept, but that it is an essential one. I apoligize if it is not clear, but English is not my first language. While I can explain how to code something (technical language), I am not at ease for speaking about thoughts. – Serge Ballesta Jun 02 '23 at 13:54