3

In C (at least) local string variables are allocated in the .rodata section or .data segment generally.

int main(){
    char string[] = "hello world!"; //this is allocated statically in the .rodata section not in the stack.
}

Why not store them on the stack, since it is a local variable? Will the string data be in memory for the full execution time? Is that not a waste of memory?

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
Carlitos_30
  • 371
  • 4
  • 13
  • 2
    you never use string, so it isn't allocated. The string literal is in rodata – stark Jul 24 '22 at 12:39
  • 1
    The compiler usually don't know the location of the stack when the process begins. How can it then put data there directly and not copy it from somewhere else? – Some programmer dude Jul 24 '22 at 12:43
  • 4
    `string` will be located in the stack in your example. – interjay Jul 24 '22 at 12:44
  • 1
    The question is, where will it get the string content from when necessary if it won’t store it in the .rodata? – numzero Jul 24 '22 at 12:51
  • 1
    "strings" are not allocated at all. Ever. They are one kind of value that a character array can hold, just like even numbers are a kind of value that an `int` can hold. Arrays that can or do contain strings can be allocated. Your `string` variable is such an array, and string literals also correspond to such arrays. This is one of many places where common language usage obscures technical detail that is sometimes important to understand, to the detriment of the uninitiated. – John Bollinger Jul 24 '22 at 13:22
  • 2
    I am not answering the edited question here because it should be posted as a new question, but assembly lacks the concept of 'scope' available to C, static allocation is simply the path of least resistance. You could store the data on a stack, but you would have to write the code for that explicitly, it is not intrinsic to the language as it is in C, except perhaps as a macro or pseudo-op on some assemblers. That is the purpose of HLL's - to abstract that which you would otherwise have to code explicitly. – Clifford Jul 24 '22 at 15:18

3 Answers3

4

How have you drawn the conclusion that string is statically allocated? It is not. The literal string you are using as an initialiser is what is in .rodata.

It is possible that as an optimisation, if string is never modified that the compiler will translate all references to it to references to the literal string. In your example however it is also likely that the optimiser will eradicate it altogether since it is neither read, written or referenced.

Consider:

#include <stdio.h>
int main()
{
    volatile int stack_var = 0 ;
    volatile char string[] = "hello world!" ;
    printf( "&stack_var = %p\n", &stack_var ) ;
    printf( "&string = %p\n", string ) ;
    printf( "&\"literalstring\" = %p\n", "literal string" ) ;

}

Then consider the veracity of your assertion.

Example output from the above code at https://onlinegdb.com/BZ9yITMFY:

&stack_var = 0x7fff22144974
&string = 0x7fff2214497b
&"literalstring" = 0x55c3b4d86023

Clearly the string literal, is in an entirely different region than both stack_var and string and it is likely that what you are observing is the location of the literal initialiser string "hello world!" and not the location of the variable string. The initialiser data is copied to string on instantiation. Moreover if it were a local variable in a function, it will be reinstantiated and therefore re-initialised every time the function is called.

Further consider:

const char string2[] = "another" ;
const char* string3 = "one more" ;
&string2 = 0x7ffd029fa573
&string3 = 0x5590b3376058

string3 refers directly to the string literal and occupies no stack space in this case, so if you want a symbol that refers to a constant (read-only) string, that is the most memory efficient method.

That said it is common in C to use macros for string literal symbols:

#define STRING4 "A Literal String"

which then relies on the linker to amalgamate duplicate string literals (which any reasonable linker will do, but it is not a requirement). Unlike string3 however STRING4 can itself be used as an initialiser.

Clifford
  • 88,407
  • 13
  • 85
  • 165
  • I havent checked yet the memory in execution time, but the object code shows me that the string data is in the .rodata segment: $ readelf -x .rodata globallocal.o Hex dump of section '.rodata': 0x00000000 204c6f72 656d2069 7073756d 20646f6c Lorem ipsum dol 0x00000010 6f722073 69742061 6d65742c 20636f6e or sit amet, con 0x00000020 73656374 65747572 20616469 70697363 sectetur adipi, etc. – Carlitos_30 Jul 24 '22 at 13:02
  • 3
    @Carlitos_30 : Your comment suggest that you have not understood my explanation. I have since added to it - hopefully it is now clear. The "stack" does not exist in the object code - it exists at run-time only. The data used for initialisation has to exist somewhere (in the executable) - it is not magic! The point is that the _string data_ exists in .rodata, but the _variable_ `string` does not. – Clifford Jul 24 '22 at 13:08
  • 1
    And I have long assumed that "the data for initialisation has to exist somewhere" is prominent among the reasons that C specifies that *string literals* have static storage duration. But that's by no means the only reason. After all, string literals are used for more than array initialization. – John Bollinger Jul 24 '22 at 13:11
  • 2
    Note that I have used `volatile` to ensure that no confounding optimisation occurs, but in my test, the results remain the same even when applying -O3 in gcc with or without `volatile` - YMMV. – Clifford Jul 24 '22 at 13:16
  • Thanks, I edited the question after checked that efectivelly the string is loaded to the stack. By the way, the .rodata segment doesn't exists in execution time? Is only for the compiler? – Carlitos_30 Jul 24 '22 at 15:01
  • 2
    @Carlitos_30 : your edit poses a new and substantially different question. It is not good form (i.e. it annoys those that took the trouble to answer the original) to change a question that already has answers. You would do well to roll that back so as not to invalidate existing answers and post a new question. – Clifford Jul 24 '22 at 15:07
3

The C Standard doesn't really define concepts such as "stack allocation" and program "data segments"; instead, it uses the general concept of an "abstract machine".

However, for string literals (an example of which is your "hello world!"), it does specify something about their storage. In the example you have given, you appear to have some confusion between the nature of – and storage for – the string[] array and the nature of the string literal that is used to initialize that array (see Clifford's answer for more on that difference).

From this C11 Draft Standard1 (bold emphasis mine):

6.4.5 String Literals


6      In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence.

Thus, in order to conform to the Standard, a C compiler cannot place such a literal 'on the stack' and it must ensure that the created data remains 'alive' for the full duration of program execution.


1 The relevant section in this C17 Draft Standard has no significant change.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • Yes, but why is this? The only reason it occurs to me is that to push the string data in the stack, more memory would be required since to push the string data are necessary machine instructions to push every character in the stack, and this instructions occupy code segment memory (the instructions and the characters, like mov ax, 'b'; push ax) . Would be this the reason? – Carlitos_30 Jul 24 '22 at 12:59
  • @Carlitos_30 It is neither my place nor within my ability (generally) to explain *why* the C Standard dictates what it dictates. However, for string literals, one possibility is that a compiler/linker that uses so-called [*string pooling*](https://stackoverflow.com/q/11399682/10871073) would break if an attempt at optimization were made. – Adrian Mole Jul 24 '22 at 13:02
  • 2
    The OP seems to be asking about variable `string`, as opposed to about the string literal. Or perhaps they don't recognize that these are distinct things, with distinct identity and storage. – John Bollinger Jul 24 '22 at 13:14
  • @John Yeah - I also started to think that, especially after seeing Clifford's answer. I'll try to add a 'prelude' to my answer, without impinging on (or copying from) anything said by Clifford. ... – Adrian Mole Jul 24 '22 at 13:19
  • @AdrianMole Feel free to duplicate/paraphrase/restate or whatever anything from my answer - after all only one can be "accepted" and if the useful information is distributed across multiple answers, how can the OP choose? Personally I am refraining from getting technical with ISO standards etc. If the standard was a good way of understanding the language (rather than, say, building a compiler), we would not need all those C programming references ;-). My example merely demonstrates that the assertion is false, and is a misinterpretation of an accurate observation. – Clifford Jul 24 '22 at 13:44
  • @Clifford It's not about 'winning' or points, as far as I'm concerned. It's about not repeating information. Our two answers address different aspects of the OP's issue and, as such, I think both are worthwhile. Nothing wrong with having multiple answers to a question. – Adrian Mole Jul 24 '22 at 13:46
  • 1
    @AdrianMole : You responded too quickly - I removed that phrase in the edit - it was meant to be humorous though, not competitive ;-). No of course multiple answers are great. But if the OP finds both useful in different aspects, he might struggle to indicate an "accepted" answer. That said in this case neither answer is necessarily "partial". I often repeat information already provided in other answers for two reasons: 1) I will not necessarily read all other answers, 2) I might think I can explain something more clearly (not always true, but in my head it starts out that way!). – Clifford Jul 24 '22 at 13:52
1

Why not store them on the stack, since it is a local variable?

"The stack" only exists at runtime - once the program terminates, any data that is stored there ceases to exist. The "hello world!" string literal that is used to initialize the string array has to somehow be persisted between runs of the program, so it's stored as part of the program image itself in segments like .rodata.

Of course, depending on how you use the string array and how aggressively the code is optimized, storage for it may not be allocated at all; if you never try to update its contents or take the address of it or any of its elements, the compiler may simply replace all references to it with references to the literal.

John Bode
  • 119,563
  • 19
  • 122
  • 198