7

I've been reading in various sources that string literals remain in memory for the whole lifetime of the program. In that case, what is the difference between those two functions

char *f1() { return "hello"; }
char *f2() {
   char str[] = "hello";
   return str;
}

While f1 compiles fine, f2 complains that I'm returning stack allocated data. What happens here?

  • if the str points to the actual string literal (which has static duration), why do I get an error?
  • if the string literal is copied to the local variable str, where does the original string literal go? does it remain in memory with no reference to it?
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
blue_note
  • 27,712
  • 9
  • 72
  • 90
  • 2
    In `char str[] = "hello"`, the `"hello"` stays static and the `str` contains a copy of the string on the stack. You then return a pointer to that copy on the stack. – Blaze Jun 19 '19 at 12:37
  • 2
    (To be ultra pedantic, a string literal is the quoted string in the source code. When the program is translated [compiled], the string literal is used to initialize an array of static storage duration. That array is a different thing; it is not the string literal, but people often refer to it as the string literal.) – Eric Postpischil Jun 19 '19 at 12:47
  • @EricPostpischil: please, be pedantic!! Does the `char str[] = "hello"` copy this static array to an automatic one, element by element? And the static one still exists? If so, can you point to a reference, I can't find it in the specification.. – blue_note Jun 19 '19 at 13:00
  • 1
    @blue_note: The C standard defines C in terms of an abstract machine, an imaginary computer going through elementary steps. In that abstract machine, yes, the `"hello"` in `char str[] = "hello"` causes an array of static storage duration containing h, e, l, l, o, and the null character to be created, and that array is used to initialize `str`. Compilers do not have to implement the actual abstract machine; they can create any program that gets the same results (the same *observed behavior* as that is defined by the C standard). So they will optimize. – Eric Postpischil Jun 19 '19 at 13:05
  • @blue_note: If `char str[] = "hello";` appears outside a function, so it is static, the compiler is likely to put `str` in a part of the object module that is for initialized writeable data, with the h, e, l, l, o, and null character in it, and then there will be no separate array for the string literal anywhere in the object module. If it appears inside a function, so it is automatic and the initialization may have to be performed more than once, the compiler may store the string in non-writeable data and copy it each time the initialization is needed. So the static array will actually exist. – Eric Postpischil Jun 19 '19 at 13:08
  • @EricPostpischil So, in response to *In `char str[] = "hello"`, the `"hello"` stays static*, the `"hello"` doesn't need to exist as a string literal at all? – Andrew Henle Jun 19 '19 at 13:09
  • @blue_note: As for where this is specified, the C 2018 standard describes string literals in 6.4.5, of which paragraph 6 says, in part, “The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type `char`, and are initialized with the individual bytes of the multibyte character sequence.” – Eric Postpischil Jun 19 '19 at 13:10
  • @AndrewHenle: I am not clear on whether you are asking about the abstract machine, an actual implementation, the `"hello"` that is the string literal, or the data that the compiler has to have. If `char str[] = "hello"` appears outside a function, it may result in `str` being represented in the object module as space inside a section for initialized writeable data, and that space will contain the characters of the string. There will not need to be any separate object for the `"hello"`. Although in the abstract machine, `str` and `"hello"` both exist, the latter can be discarded in practice. – Eric Postpischil Jun 19 '19 at 13:15
  • 1
    @blue_note: For another example, if `char str[] = "a"` appears inside a function, the string is short enough that the compiler might not store and the null character as a static object in the object module. Instead, it might implement it as an immediate operand in the instruction stream. The abstract machine has a string literal, but the optimized code the compiler produces does not need to. – Eric Postpischil Jun 19 '19 at 13:18
  • The string literal in `sizeof "string"` might not be stored anywhere. – Ian Abbott Jun 19 '19 at 14:50

4 Answers4

6

This

char str[] = "hello";

is a declaration of a local array that is initialized by the string literal "hello".

In fact it is the same as if you declared the array the following way

char str[] = { 'h', 'e', 'l', 'l', 'o', '\0' };

That is the own area of memory (with the automatic storage duration) of the array is initialized by a string literal.

After exiting the function the array will not be alive.

That is the function

char *f2() {
   char str[] = "hello";
   return str;
}

tries to return a pointer to the first element of the local character array str that has the automatic storage duration.

As for this function definition

char *f1() { return "hello"; }

then the function returns a pointer to the first character of the string literal "hello" that indeed has the static storage duration.

You may imagine the first function definition the following way

char literal[] = "hello";
char *f1() { return literal; }

Now compare where the arrays are defined in the first function definition and in the second function definition.

In the first function definition the array literal is defined globally while in the second function definition the array str is defined locally.

if the str points to the actual string literal (which has static duration), why do I get an error?

str is not a pointer. It is a named extent of memory that was initialized by a string literal. That is the array has the type char[6].

In the return statement

return str;

the array is implicitly converted to pointer to its first element of the type char *.

Functions in C and C++ may not return arrays. In C++ functions may return references to arrays.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
6

I've been reading in various sources that string literals remain in memory for the whole lifetime of the program.

Yes.

In that case, what is the difference between those two functions

char *f1() { return "hello"; }
char *f2() {
   char str[] = "hello";
   return str;
}

f1 returns a pointer to the first element of the array represented by a string literal, which has static storage duration. f2 returns a pointer to the first element of the automatic array str. str has a string literal for an initializer, but it is a separate object.

While f1 compiles fine, f2 complains that I'm returning stack allocated data. What happens here?

  • if the str points to the actual string literal (which has static duration), why do I get an error?

It does not. In fact, it itself does not point to anything. It is an array, not a pointer.

  • if the string literal is copied to the local variable str, where does the original string literal go? does it remain in memory with no reference to it?

C does not specify, but in practice, yes, some representation of the string literal must be stored somewhere in the program, perhaps in the function implementation, because it needs to be used to initialize str anew each time f2 is called.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 1
    "> why do I get an error?" Because `return str;` is equivalent to `return &str[0];` and `str[0]` is an object with automatic storage duration. So `f2` returns a pointer to an object with automatic storage duration. The returned pointer is invalid for any caller of `f2`. – Ian Abbott Jun 19 '19 at 14:44
4

The string that you will see on your stack is not a direct result of the presence of a string literal. The string is stored, in case of ELF, in a separate region of the executable binary called "string table section", along with other string literals that the linker meets during the linking process. Whenever the stack context of the code that actually caused a string to be included is instantiated, the contents of the string in string table section are actually copied to the stack.

A brief reading that you might be interested in: http://refspecs.linuxbase.org/elf/gabi4+/ch4.strtab.html

Isaaс Weisberg
  • 2,296
  • 2
  • 12
  • 28
2

char str[] = "hello"; is a special syntax which copies the string literal, and your function returns a pointer to this local variable, which is destroyed once the function returns.
char *f1() { return "hello"; } is correct but returning const char* would probably be better.

Aykhan Hagverdili
  • 28,141
  • 6
  • 41
  • 93