18

One does usually associate 'unmodifiable' with the term literal

char* str = "Hello World!";
*str = 'B';  // Bus Error!

However when using compound literals, I quickly discovered they are completely modifiable (and looking at the generated machine code, you see they are pushed on the stack):

char* str = (char[]){"Hello World"};
*str = 'B';  // A-Okay!

I'm compiling with clang-703.0.29. Shouldn't those two examples generate the exact same machine code? Is a compound literal really a literal, if it's modifiable?

EDIT: An even shorter example would be:

"Hello World"[0] = 'B';  // Bus Error!
(char[]){"Hello World"}[0] = 'B';  // Okay!
hippietrail
  • 15,848
  • 18
  • 99
  • 158
hgiesel
  • 5,430
  • 2
  • 29
  • 56
  • I'm not even sure it's UB, I've never really looked at the official [language standard](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf), but it says in 6.5.2.5 in point 12, it says that `(char[]){"abc"}` is designed to be modifiable. – hgiesel Apr 17 '16 at 12:09
  • Note that the example above doesn't actually showcase the lvalue-literal behaviour (a clearer example would be something like `(int){1} = 2;`) - the primary difference between your two snippets is that in the first you have a true string literal, while in the second you *initialize a local array* with a string literal - you'd have the same behaviour if you just modified the first to read `char str[] = ...`. – Alex Celeste Apr 17 '16 at 12:16
  • @Leushenko Do I? In the compound literal case, I initialize `str` with a pointer to it's first character. When I say `char str[] = …`, I initialize a non-modifiable char literal and copy it's content into the array `str` on the stack. – hgiesel Apr 17 '16 at 12:21
  • @hgiesel: You are right, it in fact is an "anonymous object". You can use the `const` qualifier to explicitly tell the compiler your intened. Note that in C it is the programmer's responsibility not to break this contract. Even for _string literals_, there is no guarantee a write will _not_ work (and C explicitly allows this as an implementation-extension). So not getting an error does not mean it is defined behaviour. If in doubt, please read the standard. – too honest for this site Apr 17 '16 at 12:21
  • @Olaf The only question I have know, whether when I say `char str[] = (char[]){"Hello"};`, I actually initialize it twice on the stack. – hgiesel Apr 17 '16 at 12:24
  • 2
    @hgiesel: No. 1) The C language does not even enforce using a stack (nor a heap, btw.) and there are implementations which don't. 2) There is an object allocated somewhere, plus the string literal to initialise it. But by should there be the same object allocated twice? Your example using a string is not a good one, as you can use the literal directly, but nevertheless a valid one. – too honest for this site Apr 17 '16 at 12:27

2 Answers2

17

A compound literal is an lvalue and values of its elements are modifiable. In case of

char* str = (char[]){"Hello World"};
*str = 'B';  // A-Okay!  

you are modifying a compound literal which is legal.

C11-§6.5.2.5/4:

If the type name specifies an array of unknown size, the size is determined by the initializer list as specified in 6.7.9, and the type of the compound literal is that of the completed array type. Otherwise (when the type name specifies an object type), the type of the compound literal is that specified by the type name. In either case, the result is an lvalue.

As it can be seen that the type of compound literal is a complete array type and is lvalue, therefore it is modifiable unlike string literals

Standard also mention that

§6.5.2.5/7:

String literals, and compound literals with const-qualified types, need not designate distinct objects.101

Further it says:

11 EXAMPLE 4 A read-only compound literal can be specified through constructions like:

(const float []){1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6}   

12 EXAMPLE 5 The following three expressions have different meanings:

"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char []){"/tmp/fileXXXXXX"}

The first always has static storage duration and has type array of char, but need not be modifiable; the last two have automatic storage duration when they occur within the body of a function, and the first of these two is modifiable.

13 EXAMPLE 6 Like string literals, const-qualified compound literals can be placed into read-only memory and can even be shared. For example,

(const char []){"abc"} == "abc"

might yield 1 if the literals’ storage is shared.

haccks
  • 104,019
  • 25
  • 176
  • 264
  • 1
    Note that this question is tagged with C99 – nalzok Apr 17 '16 at 13:13
  • @sunqingyao; Yes. But for this case rule is almost same as C11. – haccks Apr 18 '16 at 04:58
  • I find it irksome that (so far as I can tell) there's no syntax for const-static compound literals, since string literals are hardly the only kind of static data which will often have a single point of use. – supercat Apr 21 '16 at 18:37
4

The compound literal syntax is a short hand expression equivalent to a local declaration with an initializer followed by a reference to the unnamed object thus declared:

char *str = (char[]){ "Hello World" };

is equivalent to:

char __unnamed__[] = { "Hello world" };
char *str = __unnamed__;

The __unnamed__ has automatic storage and is defined as modifiable, it can be modified via the pointer str initialized to point to it.

In the case of char *str = "Hello World!"; the object pointed to by str is not supposed to be modified. In fact attempting to modify it has undefined behavior.

The C Standard could have defined such string literals as having type const char[] instead of char[], but this would generate many warnings and errors in legacy code.

Yet it is advisable to pass a flag to the compiler to make such string literals implicitly const and make the whole project const correct, ie: defining all pointer arguments that are not used to modify their object as const. For gcc and clang, the command line option is -Wwrite-strings. I also strongly advise to enable many more warnings and make them fatal with -Wall -W -Werror.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Unfortunately for the code I am writing, no, "*compound literal syntax is a short hand expression equivalent to a local declaration*" is not true. For instance, you can do ` foo[];` and get ` * sizeof()` bytes of uninitialised stack memory in the current scope, but there is no way to have uninitialised compound literals which only allocate, `( []) {}` is non standard and most compilers take it to be `{0}`. If there is a way, let me know. This would have been so useful. – user426 Nov 19 '21 at 01:43
  • @user426: this is a different question. BtW your quote from my answer is incomplete: I wrote *The compound literal syntax is a short hand expression equivalent to a local declaration **with an initializer** [...]*. If you want to allocate some stack uninitialized space, you can use `alloca()` on systems where it is available. – chqrlie Nov 19 '21 at 13:58