0

In this question it was said in the comments:

char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; and char arr[10] = "Hello"; are strictly the same thing. – Michael Walz

This got me thinking.

I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.

But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal with.

Does char b[10]= {72, 101, 108, 108, 111, 0}; also create a "string" literal with static storage duration? Because theoretically it is the same thing.

char a = 'a'; is the same thing as char a; ...; a = 'a';, so your thoughts are correct 'a' is simply written to a

Are there differences between:

  • char a = 'a';
  • char a = {'a'};

How/where are the differences defined?

EDIT: I see that I haven't made it clear enough that I am particularly interested in the memory usage/storage duration of the literals. I will leave the question as it is, but would like to make the emphasis of the question more clear in this edit.

Kami Kaze
  • 2,069
  • 15
  • 27
  • Regarding your initialization of `b`, if the system is using ASCII then yes that's equal as well. – Some programmer dude Apr 12 '18 at 07:56
  • @Someprogrammerdude Does that mean that any array initializer will be stored with static duration? – Kami Kaze Apr 12 '18 at 08:12
  • @KamiKaze _Are there differences between: char a = 'a'; char a = {'a'}; How/where are the differences defined?_ Actually you don't need to care. During initialisation the content of the array initializer is _copied_ into the array and there is no way to access the _original_ data of the array initializer anyway. – Jabberwocky Apr 12 '18 at 08:15
  • @MichaelWalz I do care, because I'd like to understand how it works and because of the memory usage (given that is not really a concern). – Kami Kaze Apr 12 '18 at 08:23
  • 1
    @KamiKaze [this](https://www.godbolt.org/) might be a place for you. – Jabberwocky Apr 12 '18 at 08:27

4 Answers4

2

I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.

Yes, but string literals are also a grammatical item in the C language. char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; is not a string literal, it is an initializer list. The initializer list does however behave as if it has static storage duration, remaining elements after the explicit \0 are set to zero etc.

The initializer list itself is stored in some manner of ROM memory. If your variable arr has static storage duration too, it will get allocated in the .data segment and initialized from the ROM init list before the program is started. If arr has automatic storage duration (local), then it is initialized from ROM in run-time, when the function containing arr is called.

The ROM memory where the initializer list is stored may or may not be the same ROM memory as used for string literals. Often there's a segment called .rodata where these things end up, but they may as well end up in some other segment, such as the code segment .text.

Compilers like to store string literals in a particular memory segment, because that means that they can perform an optimization called "string pooling". Meaning that if you have the string literal "Hello" several times in your program, the compiler will use the same memory location for it. It may not necessarily do this same optimization for initializer lists.


Regarding 'a' versus {'a'} in an initializer list, that's just a syntax hiccup in the C language. C11 6.7.6/11:

The initializer for a scalar shall be a single expression, optionally enclosed in braces. The initial value of the object is that of the expression (after conversion); the same type constraints and conversions as for simple assignment apply,

In plain English, this means that a "non-array" (scalar) can be either initialized with or without braces, it has the same meaning. Apart from that, the same rules as for regular assignment apply.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • To be pedantic; the initializer list doesn't behave like anything. You mean to say that initialization of an automatic array from a list behaves the same as initialization of a static array from a list. I say this because people sometimes have misconception that a braced list is actually an object and the syntax is initializing the variable from this object the list represents – M.M Apr 12 '18 at 11:50
  • @M.M There's plenty of behavior specified for initializer list throughout the whole of chapter 6.7.9. People don't have misconceptions that the braced list is actually an object. While it isn't an object from a C language point of view, the initializer list it has allocated storage and an address of its own on all true ROM systems. For convenience, the initializer list is stored in the same type as the object it is not initialize, or otherwise it would be a major pain to write the "CRT" which initializes `.data`. Though on RAM-based, PC-like computers, this part isn't necessary. – Lundin Apr 12 '18 at 12:02
1

I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.

Yes. But with char arr[10] = "Hello";, you are copying the string literal to an array arr and there's no need to "keep" the string literal. So if an implementation chooses to do remove the string literal altogether after copying it to arr and that's totally valid.

But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal.

Again there's no need to make/store a string literal for this.

Only if you directly have a pointer to a string literal, it'd be usually stored somewhere such as: char *string = "Hello, world!\n";

Even then an implementation can choose not to do so under the "as-if" rule. E.g.,

#include <stdio.h>
#include <string.h>

static const char *str = "Hi";
int main(void)
{
   char arr[10];
   strcpy(arr, str);
   puts(arr);
}

"Hi" can be eliminated because it's used only for copying it into arr and isn't accessed directly anywhere. So eliminating the string literal (and the strcpy call too) as if you had "char arr[10] = "Hi"; and wouldn't affect the observable behaviour.

Basically the C standard doesn't necessitate a string literal has to be stored anywhere as long as the properties associated with a string literal are satisfied.

Are there differences between: char a = 'a'; char a = {'a'}; How/where are the differences defined?

Yes. C11, 6.7.9 says:

The initializer for a scalar shall be a single expression, optionally enclosed in braces. [..]

Per the syntax, even: char c = {'a',}; is valid and equivalent too (though I wouldn't recommend this :).

P.P
  • 117,907
  • 20
  • 175
  • 238
  • So a string literal might not be allocating memory for the runtime of the program? – Kami Kaze Apr 12 '18 at 09:10
  • 1
    It may or may not be allocated - there's no general answer (depends on the implementation, whether it's actually possible to eliminate the string literal in question, optimisation levels, etc). Specifically for the case of array copying in the question, gcc (7.2) happily eliminates it when compiled with `-O3`. – P.P Apr 12 '18 at 09:14
  • "So if an implementation chooses to do remove the string literal altogether after copying it to arr and that's totally valid." How on earth would it be possible for the program to remove parts of its own ROM during execution? No such programs exist. More likely, on a RAM-based system, the string literal is copied into the variable at start-up, but then the variable itself needs to have static storage duration. And that only happens if the compiler can deduce that the string literal isn't used again elsewhere in the program. – Lundin Apr 12 '18 at 10:01
  • Also to be picky your example is incorrect, because a static storage file scope variable fills remaining bytes with zeros. strcpy() on a local variable does not do this. Therefore the compiler might not be able to optimize the code - it will have to know that the trailing zeroes are never used by the program. Which in turn means that `puts` has to be in the same translation unit as `main` (which it is not) or the compiler won't be able to tell what `puts` does. – Lundin Apr 12 '18 at 10:05
  • On such a system an "implementation" would know it's not possible to do so. "but then the variable itself needs to have static storage duration" In specific sentence you quoted from my answer, it refers to `arr` and there's no need to keep a "variable" there. – P.P Apr 12 '18 at 10:07
  • "Also to be picky your example is incorrect .... " All that is correct. That's why said an implementation transform as it chooses as long as the "observable" behaviour is respected. And for that, it needs to know what `puts` might do too. – P.P Apr 12 '18 at 10:09
1

In the abstract machine, char arr[10] = "Hello"; means that arr is initialized by copying data from the string literal "Hello" which has its own existence elsewhere; whereas the other version just has initial values like any other variable -- there is no string literal involved.

However, the observable behaviour of both versions is identical: there is created arr with values set as specified. This is what the other poster meant by the code being identical; according to the Standard, two programs are the same if they have the same observable behaviour. Compilers are allowed to generate the same assembly for both versions.


Your second question is entirely separate to the first; but char a = 'a'; and char a = {'a'}; are identical. A single initializer may optionally be enclosed in braces.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • It is not observable in the functionality, and thats what the standard is talking about. But it would make a difference in memory usage ( I was curious about that). Sidequestion: I got some very interesting answers out of this question. But noone voted on it, I don't want to force a vote here, but does the question lack anything or is it just that no downvote is enough of a "compliment" (in the [c] tag atleast). – Kami Kaze Apr 12 '18 at 12:08
  • @KamiKaze Memory usage isn't observable behaviour (according to the standard). There does not actually have to be any memory area corresponding to string literals or anything else, if the compiler thinks it can do without it – M.M Apr 12 '18 at 12:29
0

I belive your question is highly implementation dependant (HW and compiler wise). However, in general: arrays are placed in RAM, let it be global or not.

I know that "Hello" is string literal. String literals are stored with static storage duraction and are immutable.

Yes this saves the string "Hello" in ROM (read only memory). Your array is loaded the literal in runtime.

But if both are are really the same then char arr[10] = { 'H', 'e', 'l', 'l', 'o', '\0'}; would also create a similar string literal.

Yes but in this case the single characters are placed in ROM. The array you are initialized is loaded with character literals in runtime.

Does char b[10]= {72, 101, 108, 108, 111, 0}; also create a "string" literal with static storage duration? Because theoretically it is the same thing.

If you use UTF-8, then yes, since char == uint8_t and those are the values.

Are there differences between: char a = 'a'; char a = {'a'}; How/where are the differences defined?

I believe not.

In reply to edit

Do you mean the lifetime of storage of string literals? Have a look at this.

So a string literal has static storage duration. It remains throughout the lifetime of the program, hardcoded in memory.

  • Please take a look at my edit. I think I haven't made my intentions clear enough, maybe you want to change your answer accordingly – Kami Kaze Apr 12 '18 at 09:17
  • @KamiKaze Did my edit answer your question correctly? –  Apr 12 '18 at 10:26
  • it is more about the differences in this regarding the different initializations. – Kami Kaze Apr 12 '18 at 10:30
  • @KamiKaze I think the difference is that when you write "Hello", the literal is stored as a whole. When you write characters or digits, the literals are stored individually and then loaded into array. Otherwise, they all have static storage duration. Of course, the array is in RAM and modifiable.. –  Apr 12 '18 at 10:33