2

It is well known that one must not modify string literals in C. The spec(section 6.4.5-7) clearly mentions that modifying a string literal is undefined behaviour. Trying to do so with GCC results in a segfault as the literals get stored in read-only memory.

However the following code seems to work fine with GCC.

int main() {
  int *arr = (int[]){1,2,3};
  arr[1] = 100;
  printf("arr[1]: %i\n", arr[1]);
}

Looking at section 6.5.2.5.-5 of the spec, shows that arr would have an automatic storage duration, similar to if I had declared int arr[] = {1,2,3}; instead.

Is there a reason why string literals are handled differently?

Akay
  • 225
  • 2
  • 8
  • Well, i'd guess one reason is stated right there in the cited paragraph for string literals: "It is unspecified whether these arrays are distinct provided their elements have the appropriate values". So if you use the same string literal in multiple places, the compiler doesn't have to create multiple copies, but can instead just refer to the same one over and over again (which of course wouldn't be possible if you were allowed to modify them) – Felix G Jul 14 '20 at 09:19
  • 1
    `char *foo = (char[]){'f', 'o', 'o', 0}; foo[2] = 'x'; /* now fox */` – pmg Jul 14 '20 at 09:19
  • I don't understand the question. What is the connection? – Sourav Ghosh Jul 14 '20 at 09:23
  • @FelixG Ooh!! It also says that `const` compound literals need not refer to distinct objects. And since string literals are implied to be `const`...makes sense. – Akay Jul 14 '20 at 09:24
  • @pmg `char *foo = (const char[]){'f', 'o', 'o', 0};` is writable, in contrast with `char foo[] = "foo";` – Akay Jul 14 '20 at 09:32
  • @Akay String literals are not implied to have `const` - it's simply undefined to modify it. Meaning, it may crash in one environment. But work as expected on another. – P.P Jul 14 '20 at 09:35
  • @SouravGhosh The question is, as a language designer what justification would one provide for the different treatment of string vs compound literals. I believe the question you've linked to sufficiently answers this. Thank you. – Akay Jul 14 '20 at 09:39
  • @pmg: FYI, `(char[]){'f', 'o', 'o', 0}` can be written `(char []){"foo"}`. – Eric Postpischil Jul 14 '20 at 10:01
  • 1
    I think string literals are not implied to be const-qualified for historical reasons. The `const` qualifier is a newer feature of the language (possibly borrowed from C++ during standardization), but string literals were already around since the early days of C. – Ian Abbott Jul 14 '20 at 10:08

1 Answers1

1

Trying to do so with GCC results in a segfault as the literals get stored in read-only memory.

This is false in general.

A good counter-example is the Linux kernel.

It is compiled by GCC and stored in RAM. Read about undefined behavior.

On the OSDEV wiki you can find mentions of other operating system kernels compiled by GCC and stored in writable RAM.

And with special linker flags, you can ask GCC to put string literals in writable memory. There are few reasons to do so in 2020 (except when you are coding your toy operating system).

You might write or patch an existing C compiler (such as nwcc or tinycc) to put literal strings in some writable data segment.

You could be interested by static program analysis tools such as Frama-C, Clang Static analyzer.

Is there a reason why string literals are handled differently?

Yes, coding optimizing compilers is difficult. Be also aware of Rice's theorem. Consider using CompCert (you may need to pay for a license), or writing your GCC plugin (it might store every literal string starting with a in some data segment).

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547