69

Why does the first function return the string "Hello, World" but the second function returns nothing. I thought the return value of both of the functions would be undefined since they are returning data that is out of scope.

#include <stdio.h>
// This successfully returns "Hello, World"
char* function1()
{
    char* string = "Hello, World!";
    return string;
}
// This returns nothing
char* function2()
{
    char string[] = "Hello, World!";
    return string; 
}

int main()
{
    char* foo1 = function1();
    printf("%s\n", foo1); // Prints "Hello, World"
    printf("------------\n");
    char* foo2 = function2(); // Prints nothing
    printf("%s\n", foo2);
    return 0;
}
Tobs
  • 493
  • 4
  • 6
  • 1
    @haccks How is it not a duplicate? The reason why one version works and another doesn't, is because in one case the function returns a pointer to a local variable. – Lundin Sep 07 '17 at 07:59
  • 8
    There are many canonical duplicates that can be used: [What is the difference between char s and char *s?](https://stackoverflow.com/questions/1704407/what-is-the-difference-between-char-s-and-char-s), [String literals: Where do they go?](https://stackoverflow.com/questions/2589949/string-literals-where-do-they-go), [How to access a local variable from a different function using pointers?](https://stackoverflow.com/questions/4570366/how-to-access-a-local-variable-from-a-different-function-using-pointers). – Lundin Sep 07 '17 at 07:59
  • 9
    @Leushenko, think not. The question here is about the return value from a function, not just the difference between `char[]` and `char *`. – machine_1 Sep 07 '17 at 08:33
  • 33
    I tire of the admins here aggressively declaring that well written newbie questions are duplicates. The OP knew enough C to ask the question in a way that was helpful to themselves. 'Translating' a question asked in another way often requires already having the skills that the OP is trying to develop. – verisimilidude Sep 07 '17 at 08:37
  • 26
    [A duplicate is not the same as a bad question](https://stackoverflow.blog/2010/11/16/dr-strangedupe-or-how-i-learned-to-stop-worrying-and-love-duplication/), and unlike other close votes it doesn't mean the user's question isn't valid. To me this q. is an example of the rare "upvote+close" combo: this is a great first question, but it needs to belong to the greater "array variable vs. pointer" family of questions for full understanding, and the fundamental truth behind the answers is going to be the same. – Alex Celeste Sep 07 '17 at 08:45
  • By the way, if you need to print un-formatted output, it is better to use `fputs()` function. Also the `puts()` function would be a choice if you want a new-line to follow. – machine_1 Sep 07 '17 at 08:58
  • 11
    @verisimilidude - The idea is that linking all similar questions to each other should help people find all the good answers in one place. If nothing else, that helps Google figure out which post they should put at the top. – Bo Persson Sep 07 '17 at 09:19
  • 3
    @verisimilidude linking questions together can be done by anyone with enough experience. The admins/moderators rarely intervene – M.M Sep 07 '17 at 11:16
  • 1
    @verisimilidude: as Leushenko said duplicate does not mean the question has a bad quality. But it links questions of the same category together making them appear in the "Linked" block on the right hand side of the page. This way the asker _aswell as_ a later seacher is able to look at more resources and different approaches. – try-catch-finally Sep 08 '17 at 06:22

6 Answers6

71

the second function returns nothing

The string array in the second function:

char string[] = "Hello, World!";

has automatic storage duration. It does not exist after the control flow has returned from the function.

Whereas string in the first function:

char* string = "Hello, World!";

points to a literal string, which has static storage duration. That implies that, the string still exists after returning back from the function. What you are returning from the function is a pointer to this literal string.

JFMR
  • 23,265
  • 4
  • 52
  • 76
  • 28
    I think this could use some clarification. In both cases the string `"Hello, world!";` has static storage duration. In the first case, we create an automatic array called `string`, and copy the static string into the automatic array. In the second case we create an automatic pointer called `string` which points into the static storage area. – M.M Sep 07 '17 at 11:18
  • 1
    Exactly. Change the line to `static char string[] = "Hello, world!";` and see what happens. This changes the storage duration of the array from automatic to static. (Also, you should get in the habit of declaring data you don’t need to modify `const`.) – Davislor Sep 07 '17 at 13:37
  • 1
    Does that mean that all `char*` are static? If I wrote `static char* string = "Hello, World!"` would that change anything. – Tobs Sep 07 '17 at 14:01
  • 1
    That would make `string` a static pointer to a string literal, which in this case just wastes a few bytes of memory. If you call the same function again, the `static` variable will still exist and have been set. The difference between `char* string` and `char string[]` in this case is that the former points to the string constant, and the latter makes a remporary copy of it in a different chunk of memory. You would normally do that so you can modify the copy. Declaring the copy `static` makes the array persist after the function exits. It will still be valid after the function returns. – Davislor Sep 07 '17 at 14:33
  • 2
    It is the case that a string literal, such as `"Hello, world!"`, has static storage. String literals are kind of weird for historical reasons. It’s only legal to store them in a `char*` instead of a `const char*` for backward compatibility with code written before `const` existed. But that leads to bugs when you do try to modify the string through the non-`const` pointer.. It would be even better to declare it `const char* const`, since it’s a never-modified pointer to unmodifiable memory. – Davislor Sep 07 '17 at 14:47
  • @眠りネロク I think it would be very much worth adding the information in that top-rated comment to your otherwise-excellent answer. – Dúthomhas Sep 08 '17 at 03:48
27

The first thing you need to learn about strings is that a string literal is really an array of read-only characters with a lifetime of the full program. That means they will never go out of scope, they will always exist throughout the execution of the program.

What the first function (function1) does is returning a pointer to the first element of such an array.

With the second function (function2) things are a little bit different. Here the variable string is a local variable within the function. As such it will go out of scope and cease to exist once the function returns. With this function you return a pointer to the first element of that array, but that pointer will immediately become invalid since it will point to something which no longer exist. Dereferencing it (which happens when you pass it to printf) will lead to undefined behavior.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 5
    To elaborate: In `function2` there also exists a read-only string that will continue to exist after the function exits. But you are not returning a pointer to it. What happens is that you copy the data from that string to a local variable and then return a pointer to *that*. So the string still exists somewhere, just not at the location your pointer points to. – ComicSansMS Sep 07 '17 at 10:57
  • 2
    @ComicSansMS there might not be a read-only string. Compilers tend to optimize such assignments into moving few numbers instead of copying string. – bezet Sep 07 '17 at 14:51
  • 1
    @bezet I suppose the answer may be more precisely phrased by using the "as if" terminology that the spec is fond of, but I think that in this case we can get away without it. There is such a read-only string in the "mind" of the compiler as it interprets your code, whether that string ever gets emitted into the executable verbatim is to be determined by the optimizer. – Cort Ammon Sep 07 '17 at 21:01
7

A very important thing to remember when coding in C or other stack based languages is that when a function returns, it (and all its local storage) is gone. This means that if you want someone else to be able to see the results of your methods hard work, you have to put it somewhere that will still exist after your method has ceased to, and to do that means you need to get an understanding of where C stores stuff and how.

You probably already know how an array operates in C. It is just a memory address that is incremented by the size of the object and you probably also know that C does not do bounds checking so if you want to access the 11th element of a ten element array, no one is going to stop you, and as long as you don't try to write anything, no harm done. What you may not know is that C extends this idea to the way it uses functions and variables. A function is just a area of memory on a stack that is loaded on demand and the storage for its variables are just offsets from that location. Your function returned a pointer to a local variable, specifically, the address of a location on the stack that holds the 'H' of 'Hello World\n\0' but when then you called another function (the print method) that memory was reused by the print method to do what it needed. You can see this easily enough (DO NOT DO THIS IN PRODUCTION CODE!!!)

char* foo2 = function2(); // Prints nothing
ch = foo2[0];  // Do not do this in live code!
printf("%s\n", foo2);  // stack used by foo2 now used by print()
printf("ch is %c\n", ch);  // will have the value 'H'!
Paul Smith
  • 454
  • 6
  • 11
  • Yes, and even if it sorta works with static strings, it's a really bad habit to get into (IMHO, anyway). – jamesqf Sep 08 '17 at 03:31
5

I thought the return value of both of the functions would be undefined since they are returning data that is out of scope.

No. That's not the case.

In function function1 you are returning pointer to a string literal. Returning pointer to a string literal is fine because string literals have static storage duration. But that's not true with automatic local variable.

In function function2 the array string is an automatic local variable and the statement

return string; 

returns a pointer to an automatic local variable. Once the function return, the the variable string will no longer exist. Dereferencing the returned pointer will cause undefined behavior.

haccks
  • 104,019
  • 25
  • 176
  • 264
1

"Hello, World!" is a string literal, which has a static storage duration, so the problem is elsewhere. Your first function returns the value of string, which is fine. The second function however returns the address of a local variable (string is the same as &string[0]), resulting in undefined behavior. Your second printf statement could print nothing, or "Hello, World!", or something else entirely. On my machine, the program just gets a segmentation fault.

Always take a look at messages your compiler outputs. For your example, gcc gives:

file.c:12:12: warning: function returns address of local variable [-Wreturn-local-addr]
    return string; 
           ^

which is pretty much self-explanatory.

Dmitry Grigoryev
  • 3,156
  • 1
  • 25
  • 53
0

I thought the return value of both of the functions would be undefined since they are returning data that is out of scope.

Both functions return a pointer. What matters is the scope of the referent.

In function1, the referent is the string literal "Hello, World!", which has static storage duration. string is a local variable which points to that string, and conceptually, a copy of that pointer is returned (in practice, the compiler will avoid unnecessarily copying the value).

In function2, conceptually the referent is the local array string, which has been automatically sized (at compile time) to be big enough to hold the string literal (including a null terminator, of course), and been initialized with a copy of the string. The function would return a pointer to that array, except that the array has automatic storage duration and thus no longer exists after exiting the function (it is indeed "out of scope", in more familiar terminology). Since this is undefined behaviour, the compiler may in practice do all sorts of things.

Does that mean that all char* are static?

Again, you need to distinguish between the pointer and the referent. Pointers point at data; they don't themselves "contain" the data.

You have reached a point where you should properly study what arrays and pointers actually are in C - unfortunately, it's a bit of a mess. The best reference I can offer offhand is this, in Q&A format.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153