22
const char * foo()
{
    return "abcdef";
}

int main()
{
    printf("%s", foo());
}

Can a conforming compiler decide to allocate "abcdef" on the stack? I.e. what in the standard forces the compiler to allocate it in the .data section?

Niall
  • 30,036
  • 10
  • 99
  • 142
qdii
  • 12,505
  • 10
  • 59
  • 116
  • 3
    There is no standard. Each compiler can implement it freely. In practice, any decent compiler will allocate this literal string in the RO-data section. Besides, function `foo` returns a pointer, so you most certainly **do not** want this pointer to point to an address in the stack of this function. – barak manos Nov 17 '14 at 12:13
  • 1
    May be you want to read this - http://www.nongnu.org/avr-libc/user-manual/mem_sections.html – ha9u63a7 Nov 17 '14 at 12:14
  • 2
    @barakmanos That you don't want to return pointers to objects with automatic storage duration, e.g. ones that are created on the stack, is probably exactly why the OP asked the question. – Peter - Reinstate Monica Nov 17 '14 at 12:29
  • 3
    @barakmanos: Yes there is a standard. String literals have static storage duration, so can't be placed on the stack. – Mike Seymour Nov 17 '14 at 12:36
  • @MikeSeymour: OK, thanks for pointing that out. I indeed removed that specific statement from my answer below because I wasn't 100% sure of it. – barak manos Nov 17 '14 at 12:37
  • 8
    the standard doesn’t talk about stack and heap or .data or .bss. It talks about storage duration, visibility etc. – bolov Nov 17 '14 at 12:40
  • @Bolov It can still be answered using the standard (see the answers.) –  Nov 17 '14 at 12:44
  • 1
    @bolov: True, but the concept of storage duration maps directly onto the corresponding implementation. Automatic storage must have a stack-like structure (and is called "the stack" in various parts of the standard); the "free store" used for dynamic storage is just another name for a "heap"; and static storage must be in persistent blocks of memory that we might as well call "data sections". – Mike Seymour Nov 17 '14 at 12:45
  • @MikeSeymour note since this is also tagged C, I wanted to clarify that neither C99 nor C11 refer to the *stack* but the C++ standard does. – Shafik Yaghmour Nov 17 '14 at 12:53
  • 1
    @ShafikYaghmour: OK, only C++ does. I didn't notice that this was tagged with multiple languages. But even in C, automatic storage must have a stack-like structure, so we might as well call it "the stack" unless we're in an abnormally pedantic mood. – Mike Seymour Nov 17 '14 at 12:54
  • @MikeSeymour I don't disagree with anything you said, I just wanted to clarify – Shafik Yaghmour Nov 17 '14 at 12:56
  • +1 This is an interesting question, can you provide some context, I am curious what drove this question. – Shafik Yaghmour Nov 17 '14 at 13:11
  • @ShafikYaghmour I was just reviewing a coleague's code and I thought "I know this will work in most cases, but maybe under certain circumstances it will not? maybe if optimizations are turned off, or if we compile an exotic type of binary executable that does not have a .data section". That's about it :) – qdii Nov 17 '14 at 13:20
  • @qdii it is great that you challenge assumptions made in code and try to understand how it might break. Optimizations around string literals is a very interesting topic, you may find [String Literal address across translation units](http://stackoverflow.com/q/26279628/1708801) and interesting read. – Shafik Yaghmour Nov 17 '14 at 14:16
  • How is this not a duplicate more than 6 years after Stack Overflow launched? – Peter Mortensen Nov 17 '14 at 21:00
  • @PeterMortensen because the search facility is not great... – M.M Nov 23 '14 at 19:35

5 Answers5

26

From the C++ specification § 2.14.5/8 for string literals;

Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7).

It is also worthwhile mentioning this, static storage duration, applies to all the string literals; hence L"", u"", U"" etc; § 2.14.5/10-12.

In turn, for the static storage duration § 3.7.1/1;

All variables which do not have dynamic storage duration, do not have thread storage duration, and are not local have static storage duration. The storage for these entities shall last for the duration of the program (3.6.2, 3.6.3).

Hence, your string "abcdef" shall exist for the duration of the program. The compiler can choose where to store it (and this may be a system constraint), but it must remain valid.

For the C language specification (C11 draft n1570), string literals § 6.4.5/6;

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.

And the static storage duration § 6.2.4/3;

An object whose identifier is declared without the storage-class specifier _Thread_local, and either with external or internal linkage or with the storage-class specifier static, has static storage duration. Its lifetime is the entire execution of the program and its stored value is initialized only once, prior to program startup.

The same rationale for the location applies (it will most likely be a system constraint), but must remain valid for the duration of the program.

Niall
  • 30,036
  • 10
  • 99
  • 142
  • So L"abcdef" is a narrow string literal too? I thought it would be a wide string? – qdii Nov 17 '14 at 12:22
  • 1
    @qdii. `L"abcdef"` is a wide string literal. The static storage applies for all the string literals 2.14.5/10-12 – Niall Nov 17 '14 at 12:24
6

what in the standard forces the compiler to allocate it on the .data section?

Nothing. But it can certainly not be on the stack, since pointers to a string literal must never be invalidated (as the literal has static storage duration1), and values on the stack get overwritten by other frames at some point. And objects with static storage duration usually lie on a section dedicated to that - the .data section.

Under the as-if rule, he could put it on the stack if the observable behavior of the program doesn't change; That is very unlikely to happen though, since that wouldn't benefit the performance of the program in any way (and nonsensical relevant compilers are yet to be written).


1) [lex.string]/8:

Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration (3.7).

Columbo
  • 60,038
  • 8
  • 155
  • 203
6

Referring to N1570 (C11 draft) 6.4.5/6 String literals (emphasis mine going forward):

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals.78) The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence.

This means that string literals have lifetime of whole execution of program, as mentioned in 6.2.4/3 Storage durations of objects:

An object whose identifier is declared without the storage-class specifier _Thread_local, and either with external or internal linkage or with the storage-class specifier static, has static storage duration. Its lifetime is the entire execution of the program and its stored value is initialized only once, prior to program startup.

It's unlikely that compiler places them on stack, due to its nature (hint: preservation between functions' calls).

Note that C Standard does not explicitely forbid to place string literals on the stack. In fact it does not even define such term as stack nor .data section. It's up to compiler, to choose whatever data placement, that is conformant with the Standard.

Grzegorz Szpetkowski
  • 36,988
  • 6
  • 90
  • 137
3

Previous answers have already quoted from the standard, so I'll go with the logical approach instead.

You can copy this literal string from the RO-data section into the stack every time the function is called:

const char* foo()
{
    const char str[] = "abcdef";
    return str;
}

But this function returns a pointer.

And you most certainly do not want this pointer to contain an address in the stack.

So it makes no sense to have that literal string allocated on the stack to begin with.

barak manos
  • 29,648
  • 10
  • 62
  • 114
  • I don't understand your example. This code does indeed copy the literal from .rodata to the stack. And returning a pointer to a local variable will be a bug no matter where the string literal is stored. For this example to make any sense, it would have to be `return "abcdef";`. You can't rely on the compiler to optimize away the copy-down from .rodata. – Lundin Nov 17 '14 at 12:33
  • @Lundin: I am trying to explain why it makes no sense to have that string of characters allocated on the stack. – barak manos Nov 17 '14 at 12:34
  • (I didn't down vote, since the answer is correct. I just found the example confusing.) – Lundin Nov 17 '14 at 14:39
  • @Lundin: I understand, that's why I added that statement at the bottom. Some users here tend to down-vote without leaving a comment, and that is really annoying... – barak manos Nov 17 '14 at 14:40
3

Everything of static duration must remain allocated until the program exits; it would be possible for such things to be located on the stack, but only if they are allocated before any user code is executed. Such a design would be unusual, but might be advantageous in e.g. a plug-in architecture where it was desired to have several threads run a plug-in simultaneously and have every thread's instance behave completely independently. If the architecture would have all instances of a plug-in share the same static data, then at least from the standpoint of the plug-in architecture, data which shouldn't be shared shouldn't be static. While it might arguably be better to have each instance store its static data in a block of space requested from the heap, that would necessitate having each instance free up that block of space when it was done. Having each plug-in instance allocate all its data on its stack (including a suitable-sized char[] that which would be subdivided to satisfy malloc() or new requests) prior to running user code would ensure that killing the thread associated with an instance would free up the storage associated with it.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • An example with harvard architecture with data section above stack ( http://www.nongnu.org/avr-libc/user-manual/malloc.html ) with the right flags for an AVR, on start up, the data section is copied from flash, then further stack allocations move the stack pointer down from there ( though the .data section copy isn't via the stack mechanism, just can be made contiguous with it ) – Pete Kirkham Nov 17 '14 at 23:11
  • @PeteKirkham: The architectures I've used all have data sections which could be located independently of the heap. It's not uncommon for systems to have a small fast memory and a big slow memory, and for code to arrange to have all static objects and the stack in the small fast memory, but that wouldn't require that stuff be placed on the stack in any semantic sense. – supercat Nov 17 '14 at 23:17