3

Why can I return from a function an array setup by malloc:

int *dog = (int*)malloc(n * sizeof(int));

but not an array setup by

 int cat[3] = {0,0,0};

The "cat[ ]" array is returned with a Warning.

Thanks all for your help

Kristen Martinson
  • 1,829
  • 3
  • 22
  • 33
  • 2
    You should only get the warning if `cat` were declared inside the function you are trying to return it from. If you have declared cat with the `[]` syntax as a global variable, it can be returned. It's a technical but important point. – Ray Toal Jul 24 '11 at 04:21
  • Thanks ray, here is a point at you – Kristen Martinson Jul 24 '11 at 04:22
  • Daniel Hicks hit it on the head: an "automatic variable" (like your array declared inside a function) ONLY EXISTS FOR THE SCOPE IT WAS CREATED IN. Effectively, it "disappears" when you exit the function. And Bad Things can happen if you try to use a (now invalid) reference to the (now deallocated) array. – paulsm4 Jul 24 '11 at 04:26
  • We ask this question of candidates for programming positions all the time. Only a small fraction can answer it and adequately explain why it works the way it does. – user47559 Jul 24 '11 at 04:48
  • See http://stackoverflow.com/questions/6441218/can-a-local-variables-memory-be-accessed-outside-its-scope – Gowtham Jul 24 '11 at 05:48

7 Answers7

8

This is a question of scope.

int cat[3]; // declares a local variable cat

Local variables versus malloc'd memory

Local variables exist on the stack. When this function returns, these local variables will be destroyed. At that point, the addresses used to store your array are recycled, so you cannot guarantee anything about their contents.

If you call malloc, you will be allocating from the heap, so the memory will persist beyond the life of your function.

If the function is supposed to return a pointer (in this case, a pointer-to-int which is the first address of the integer array), that pointer should point to good memory. Malloc is the way to ensure this.

Avoiding Malloc

You do not have to call malloc inside of your function (although it would be normal and appropriate to do so).

Alternatively, you could pass an address into your function which is supposed to hold these values. Your function would do the work of calculating the values and would fill the memory at the given address, and then it would return.

In fact, this is a common pattern. If you do this, however, you will find that you do not need to return the address, since you already know the address outside of the function you are calling. Because of this, it's more common to return a value which indicates the success or failure of the routine, like an int, than it is to return the address of the relevant data.

This way, the caller of the function can know whether or not the data was successfully populated or if an error occurred.

#include <stdio.h>             // include stdio for the printf function

int rainCats (int *cats);      // pass a pointer-to-int to function rainCats

int main (int argc, char *argv[]) {

    int cats[3];               // cats is the address to the first element

    int success;               // declare an int to store the success value
    success = rainCats(cats);  // pass the address to the function

    if (success == 0) {
        int i;
        for (i=0; i<3; i++) {
            printf("cat[%d] is %d \r", i, cats[i]);
            getchar();
        }
    }

    return 0;
}

int rainCats (int *cats) {
    int i;
    for (i=0; i<3; i++) {      // put a number in each element of the cats array
        cats[i] = i;
    }
    return 0;                  // return a zero to signify success
}

Why this works

Note that you never did have to call malloc here because cats[3] was declared inside of the main function. The local variables in main will only be destroyed when the program exits. Unless the program is very simple, malloc will be used to create and control the lifespan of a data structure.

Also notice that rainCats is hard-coded to return 0. Nothing happens inside of rainCats which would make it fail, such as attempting to access a file, a network request, or other memory allocations. More complex programs have many reasons for failing, so there is often a good reason for returning a success code.

keparo
  • 33,450
  • 13
  • 60
  • 66
4

Because int cat[3] = {0,0,0}; is declaring an automatic variable that only exists while the function is being called.

There is a special "dispensation" in C for inited automatic arrays of char, so that quoted strings can be returned, but it doesn't generalize to other array types.

Hot Licks
  • 47,103
  • 17
  • 93
  • 151
  • @Kristen, no matter what you do, returning it "in this format" will crash your program. You're better off finding another way to do what you want. – zneak Jul 24 '11 at 04:22
  • 1
    Can you clarify this "special dispensation"? If you make an array `char hello[12] = "hello world"` it is still automatic, and will still go away. – Dietrich Epp Jul 24 '11 at 04:29
  • 3
    @Daniel R Hicks, special "dispensation" is a bit strong. The C string is essentially a `static const`, and so may be returned. Similarly, `static int cat[3] = {0,0,0}` would work. It is important to realize that the function returns the same array no matter how many times it is called. – andrewdski Jul 24 '11 at 04:34
  • 3
    There's no "special dispensation". For char *str = "hello world", "hello world" is placed in the .data section of the executable, not on the stack. str is on the stack, but it's a pointer to constant data. The address of "hello world" will still be meaningful after a return, even though the pointer to it has been clobbered. – user47559 Jul 24 '11 at 04:41
  • Yeah, on reflection you guys are right, mostly. I definitely remember that there is a special dispensation in there somewhere, for string literals, but it's probably not relevant in this case. Certainly, there tends to be confusion between, say, cat[] and cat* -- they're interchangeable in many cases but are NOT the same, and that's the root of the OP's problem in this case. – Hot Licks Jul 24 '11 at 12:42
  • @Daniel So your saying that if he did int cat* = {0,0,0};, it would be different? – Leif Andersen Jul 24 '11 at 16:12
  • It's been at least 20 years since I read the spec. I'm thinking the rule is that a string initializer is always a static, but an initializer such as `{0,0,0}` isn't guaranteed to be (though would be in practice for such a small literal) -- it can be auto space that's initialized by code or some such. But like I said, it's been at least 20 years, so the details are fuzzy (and ANSI may have changed the spec). – Hot Licks Jul 24 '11 at 18:20
4

There are two key parts of memory in a running program: the stack, and the heap. The stack is also referred to as the call stack.

When you make a function call, information about the parameters, where to return, and all the variables defined in the scope of the function are pushed onto the stack. (It used to be the case that C variables could only be defined at the beginning of the function. Mostly because it made life easier for the compiler writers.)

When you return from a function, everything on the stack is popped off and is gone (and soon when you make some more function calls you'll overwrite that memory, so you don't want to be pointing at it!)

Anytime you allocate memory you are allocating if from the heap. That's some other part of memory, maintained by the allocation manager. Once you "reserve" part of it, you are responsible for it, and if you want to stop pointing at it, you're supposed to let the manager know. If you drop the pointer and can't ask to have it released any more, that's a leak.

You're also supposed to only look at the part of memory you said you wanted. Overwriting not just the part you said you wanted, but past (or before) that part of memory is a classic technique for exploits: writing information into part of memory that is holding computer instructions instead of data. Knowledge of how the compiler and the runtime manage things helps experts figure out how to do this. Well designed operating systems prevent them from doing that.

heap:

int *dog = (int*)malloc(n*sizeof(int*));

stack:

int cat[3] = {0,0,0};
keparo
  • 33,450
  • 13
  • 60
  • 66
bshirley
  • 8,217
  • 1
  • 37
  • 43
  • This explain why when I clock an array defined by "cat array" it is much slower to work with than the "dog array": too much "pushing" and "popping" – Kristen Martinson Jul 24 '11 at 05:41
  • working with pointers _is_ more efficient, but the amount of variable space you use in a function is of little effect. in assembly language parlance `sp` is often used to refer to the stack pointer, and performing `sp = sp + 8` or `sp = sp + 40` is not any different - - - how you use that memory, that can be very different - - - most importantly, you should limit the scope of any variable as much as is appropriate - - - don't try to outsmart the compiler, you will always lose – bshirley Jul 24 '11 at 06:32
1

cat[] is allocated on the stack of the function you are calling, when that stack is freed that memory is freed (when the function returns the stack should be considered freed).

If what you want to do is populate an array of int's in the calling frame pass in a pointer to an that you control from the calling frame;

void somefunction() {
  int cats[3];
  findMyCats(cats);
}

void findMyCats(int *cats) {
  cats[0] = 0;
  cats[1] = 0;
  cats[2] = 0;
}

of course this is contrived and I've hardcoded that the array length is 3 but this is what you have to do to get data from an invoked function.

A single value works because it's copied back to the calling frame;

int findACat() {
  int cat = 3;
  return cat;
}

in findACat 3 is copied from findAtCat to the calling frame since its a known quantity the compiler can do that for you. The data a pointer points to can't be copied because the compiler does not know how much to copy.

Bill Dudney
  • 3,358
  • 1
  • 16
  • 13
1

When you define a variable like 'cat' the compiler assigns it an address. The association between the name and the address is only valid within the scope of the definition. In the case of auto variables that scope is the function body from the point of definition onwards.

Auto variables are allocated on the stack. The same address on the stack is associated with different variables at different times. When you return an array, what is actually returned is the address of the first element of the array. Unfortunately, after the return, the compiler can and will reuse that storage for completely unrelated purposes. What you'd see at a source code level would be your returned variable mysteriously changing for no apparent reason.

Now, if you really must return an initialized array, you can declare that array as static. A static variable has a permanent rather than a temporary storage allocation. You'll need to keep in mind that the same memory will be used by successive calls to the function, so the results from the previous call may need to be copied somewhere else before making the next call.

Another approach is to pass the array in as an argument and write into it in your function. The calling function then owns the variable, and the issues with stack variables don't arise.

None of this will make much sense unless you carefully study how the stack works. Good luck.

phunctor
  • 599
  • 3
  • 9
1

You cannot return an array. You are returning a pointer. This is not the same thing.

You can return a pointer to the memory allocated by malloc() because malloc() has allocated the memory and reserved it for use by your program until you explicitly use free() to deallocate it.

You may not return a pointer to the memory allocated by a local array because as soon as the function ends, the local array no longer exists.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
1

This is a question of object lifetime - not scope or stack or heap. While those terms are related to the lifetime of an object, they aren't equivalent to lifetime, and it's the lifetime of the object that you're returning that's important. For example, a dynamically alloced object has a lifetime that extends from allocation to deallocataion. A local variable's lifetime might end when the scope of the variable ends, but if it's static its lifetime won't end there.

The lifetime of an object that has been allocated with malloc() is until that object has been freed using the free() function. Therefore when you create an object using malloc(), you can legitimately return the pointer to that object as long as you haven't freed it - it will still be alive when the function ends. In fact you should take care to do something with the pointer so it gets remembered somewhere or it will result in a leak.

The lifetime of an automatic variable ends when the scope of the variable ends (so scope is related to lifetime). Therefore, it doesn't make sense to return a pointer to such an object from a function - the pointer will be invalid as soon as the function returns.

Now, if your local variable is static instead of automatic, then its lifetime extends beyond the scope that it's in (therefore scope is not equivalent to lifetime). So if a function has a local static variable, the object will still be alive even when the function has returned, and it would be legitimate to return a pointer to a static array from your function. Though that brings in a whole new set of problems because there's only one instance of that object, so returning it multiple times from the function can cause problems with sharing the data (it basically only works if the data doesn't change after initialization or there are clear rules for when it can and cannot change).

Another example taken from another answer here is regarding string literals - pointers to them can be returned from a function not because of a scoping rule, but because of a rule that says that string literals have a lifetime that extends until the program ends.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760