-2

So I'm a bit confused on how to make a function that will return a pointer to an array of ints in C. I understand that you cannot do:

int* myFunction() {
  int myInt[aDefinedSize];
  return myInt; }

because this is returning a pointer to a local variable. So, I thought about this:

int* myFunction(){
  int* myInt = (int) malloc(aDefinedSize * sizeof(int));
  return myInt; }

This gives the error: warning cast from pointer to integer of different size This implies to use this, which works:

int* myFunction(){
  int* myInt = (int*) malloc(aDefinedSize * sizeof(int));
  return myInt; }

What I'm confused by though is this: the (int*) before the malloc was explained to me to do this: it tells the compiler what the datatype of the memory being allocated is. This is then used when, for example, you are stepping through the array and the compiler needs to know how many bytes to increment by. So, if this explanation I was given is correct, isn't memory being allocated for aDefinedSize number of pointers to ints, not actually ints? Thus, isnt myInt a pointer to an array of pointers to ints? Some help in understanding this would be wonderful. Thanks!!

StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458
teddyv
  • 29
  • 1
  • 6
  • 8
    don't cast the return of malloc! your code is fine apart from this. – Jean-François Fabre Apr 09 '17 at 13:00
  • 1
    See this: http://stackoverflow.com/q/605845/4996248 – John Coleman Apr 09 '17 at 13:00
  • 5
    Please don't "save lines" by appending the `}` to the last line of a function. It's been known to cause seizures in veteran programmers. – StoryTeller - Unslander Monica Apr 09 '17 at 13:01
  • `int* myInt` tells the compiler what the data type is. *isn't memory being allocated for aDefinedSize number of pointers to ints?* No, `malloc` only knows about bytes, has no idea what you want them for. – Weather Vane Apr 09 '17 at 13:02
  • 2
    if you forget to include then malloc is implicitly declared as returning an int and if you don't cast its return value then the compiler will print a warning so that you know you forgot to include it. – Fryz Apr 09 '17 at 13:06
  • Weather Vane, since I declare myInt as a pointer to an integer (or an array of them), doesn't this signify to step through the memory in pointer size chunks, as supposed to int sized ones? – teddyv Apr 09 '17 at 13:11
  • 1
    Clearly no...... – Karoly Horvath Apr 09 '17 at 13:12
  • Related: [*function that will return a pointer to an array of ints*](http://stackoverflow.com/q/17882070/509868) – anatolyg Apr 09 '17 at 13:26
  • "it tells the compiler what the datatype of the memory being allocated is" - Whoever told you so should learn the language. That is not how C works and plain nonsense. Don't cast the result of `malloc` & friends or `void *` in general. The cast is potentially even harmful if it is to the correct type. – too honest for this site Apr 09 '17 at 14:02
  • When the compiler turns an array access into machine code, a statement like `a[i] = 4;` will be turned into a machine-language instruction or instructions that multiply `i` by `sizeof(int)` to get an offset into your array, add the base address `a` to get the address of `a[i]`, and then store the value on the right-hand side of the assignment at that address. The compiler needs to know the size of an `int` to do that, and it also needs to know the size of each element to determine how much memory it needs to store `n` elements. Knowing the type also lets the compiler catch type errors. – Davislor Apr 09 '17 at 14:05
  • @Davislor please keep implementation details out of this. The standard does not state how elements are accesses by the machine. – too honest for this site Apr 09 '17 at 14:28
  • @Olaf I advise you to review section 6.5.2.1 of the draft standard you just linked to me: “The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).” It also says that array elements are contiguous in memory. See also the definition of “object representation.” – Davislor Apr 09 '17 at 16:14
  • However, it is very possible that a compiler could optimize the program to generate code *with the same effect* as writing to that memory address and then doing other stuff with it, but in a more efficient way. – Davislor Apr 09 '17 at 16:20
  • @Davislor: How is that related to what I wrote? There is absolutely no use in speculating how **could** eb implemented. The address couls also be a key into a database or anything else generating the same _observable behaviour_. It is irrelevant for the question. – too honest for this site Apr 09 '17 at 16:33

3 Answers3

2

So, if this explanation I was given is correct, isn't memory being allocated for aDefinedSize number of pointers to ints, not actually ints?

No, you asked malloc for aDefinedSize * sizeof(int) bytes, not aDefinedSize * sizeof(int *) bytes. That's the size of memory you get, the type depends on the pointer used to access the memory.

Thus, isnt myInt a pointer to an array of pointers to ints?

No, since you defined it as a int *, a pointer-to-an-int.

Of course the pointer has no knowledge of how large the allocated memory are is, but only points at the first int that fits there. It's up to you as programmer to keep track of the size.

Note that you shouldn't use that explicit typecast. malloc returns a void *, that can be silently assigned to any pointer, as in here:

int* myInt = malloc(aDefinedSize * sizeof(int));

Arithmetic on the pointer works in strides of the pointed-to type, i.e. with int *p, p[3] is the same as *(p+3), which means roughly "go to p, go forward three times sizeof(int) in bytes, and access that location". int **q would be a pointer-to-a-pointer-to-an-int, and might point to an array of pointers.

Community
  • 1
  • 1
ilkkachu
  • 6,221
  • 16
  • 30
  • At one point, I had an extended discussion of pointers-to-pointers in my answer, but edited it out as a tangent. Basically, arrays of pointers are almost never the kind of “two-dimensional array” you want. It’s unfortunate that beginners always learn about `argv` first and try to imitate it. The only advantage over an array of arrays is that you might save memory by not storing entire rows, but there are better data structures to store sparse matrices, such as compressed sparse row. – Davislor Apr 09 '17 at 14:14
  • Thank you. In the topic of pointers-to-pointers, if i were to initialize an array of pointers using `int** myArr = malloc(numElements * sizeof(int*))` then am I correct in stating that increment through the array with `myArr++` would step through memory by chunks the size of an int pointer, not an int? – teddyv Apr 09 '17 at 17:38
  • Not sure whether that was directed at @ikkachu or me, or whoever got to it first, but you are correct. That will get you (a pointer to the start of) an array of pointers to `int`. – Davislor Apr 10 '17 at 08:03
0

malloc allocates an array of bytes and returns void* pointing to the first byte. Or NULL if the allocation failed.

To treat this array as an array of a different data type, the pointer must be cast to that data type.

In C, void* implicitly casts to any data pointer type, so no explicit cast is required:

int* allocateIntArray(unsigned number_of_elements) {
    int* int_array = malloc(number_of_elements * sizeof(int)); // <--- no cast is required here.
    return int_array;
}
Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
-1

Arrays in C

In C, you want to remember that an array is just an address in memory, plus a length and an object type. When you pass it as an argument to a function or a return value from a function, the length gets forgotten and it’s treated interchangeably with the address of the first element. This has led to a lot of security bugs in programs that either read or write past the end of a buffer.

The name of an array automatically converts to the address of its first element in most contexts, so you can for example pass either arrays or pointers to memmove(), but there are a few exceptions where the fact it also has a length matters. The sizeof() operator on an array is the number of bytes in the array, but sizeof() a pointer is the size of a pointer variable. So if we declare int a[SIZE];, sizeof(a) is the same as sizeof(int)*(size_t)(SIZE), whereas sizeof(&a[0]) is the same as sizeof(int*). Another important one is that the compiler can often tell at compile time if an array access is out of bounds, whereas it does not know which accesses to a pointer are safe.

How to Return an Array

If you want to return a pointer to the same, static array, and it’s fine that you’ll get the same array each time you call the function, you can do this:

#define ARRAY_SIZE 32U

int* get_static_array(void)
{
  static int the_array[ARRAY_SIZE];
  return the_array;
}

You must not call free() on a static array.

If you want to create a dynamic array, you can do something like this, although it is a contrived example:

#include <stdlib.h>

int* make_dynamic_array(size_t n)
// Returns an array that you must free with free().
{
  return calloc( n, sizeof(int) );
}

The dynamic array must be freed with free() when you no longer need it, or the program will leak memory.

Practical Advice

For anything that simple, you would actually write:

int * const p = calloc( n, sizeof(int) );

Unless for some reason the array pointer would change, such as:

int* p = calloc( n, sizeof(int) );
/* ... */
p = realloc( p, new_size );

I would recommend calloc() over malloc() as a general rule, because it initializes the block of memory to zeroes, and malloc() leaves the contents unspecified. That means, if you have a bug where you read uninitialized memory, using calloc() will always give you predictable, reproducible results, and using malloc() could give you different undefined behavior each time. In particular, if you allocate a pointer and then dereference it on an implementation where 0 is a trap value for pointers (like typical desktop CPUs), a pointer created by calloc() will always give you a segfault immediately, while a garbage pointer created by malloc() might appear to work, but corrupt any part of memory. That kind of bug is a lot harder to track down. It’s also easier to see in the debugger that memory is or is not zeroed out than whether an arbitrary value is valid or garbage.

Further Discussion

In the comments, one person objects to some of the terminology I used. In particular, C++ offers a few different kinds of ways to return a reference to an array that preserve more information about its type, for example:

#include <array>
#include <cstdlib>

using std::size_t;

constexpr size_t size = 16U;
using int_array = int[size];

int_array& get_static_array()
{
  static int the_array[size];
  return the_array;
}

std::array<int, size>& get_static_std_array()
{
  static std::array<int, size> the_array;
  return the_array;
}

So, one commenter (if I understand correctly) objects that the phrase “return an array” should only refer to this kind of function. I use the phrase more broadly than that, but I hope that clarifies what happens when you return the_array; in C. You get back a pointer. The relevance to you is that you lose the information about the size of the array, which makes it very easy to write security bugs in C that read or write past the block of memory allocated for an array.

There was also some kind of objection that I shouldn’t have told you that using calloc() instead of malloc() to dynamically allocate structures and arrays that contain pointers will make almost all modern CPUs segfault if you dereference those pointers before you initialize them. For the record: this is not true of absolutely all CPUs, so it’s not portable behavior. Some CPUs will not trap. Some old mainframes will trap on a special pointer value other than zero. However, it’s come in very handy when I’ve coded on a desktop or workstation. Even if you’re running on one of the exceptions, at least your pointers will have the same value each time, which should make the bug more reproducible, and when you debug and look at the pointer, it will be immediately obvious that it’s zero, whereas it will not be immediately obvious that a pointer is garbage.

Davislor
  • 14,674
  • 2
  • 34
  • 49
  • 1
    To clarify: You cannot pass "an array" to or from a function. Nor does one typically return a pointer to the array. Instead a pointer **to the first element** is passed. – too honest for this site Apr 09 '17 at 14:04
  • When you give a function the name of an array as an argument, or return the name of an array from your function as in the first example, what C does is pass a pointer to the first element of the array, yes. You could equivalently write `a`, `&a` or `&a[0]` in those contexts. – Davislor Apr 09 '17 at 14:10
  • I’ve edited to say, “return an array by reference,” which I think somebody who objects to the wording “return an array” in C would accept. – Davislor Apr 09 '17 at 14:18
  • He would not. C does not support references. A pointer is a first-class type, references are not. – too honest for this site Apr 09 '17 at 14:25
  • What happens when you pass or return the name of an array is what computer scientists call pass-by-reference. You can even declare your function as `frotz_the_array( size_t n, int to_be_frotzed[n] )` and the function will see `to_be_frotzed` as an in-out array parameter. – Davislor Apr 09 '17 at 14:29
  • Anyway, OP, this discussion about terminology has no practical relevance to you. – Davislor Apr 09 '17 at 14:31
  • Did you seriously downvote this answer because I used the word *return* to describe what the statement `return the_array;` does? – Davislor Apr 09 '17 at 14:34
  • After a closer reading of your answer: `calloc` sets all bytes to zero. But that does not guarantee pointers to be _null pointers_, nor floats to be `0.0`. Which one to prefer depends on the application, recommending `calloc` is not good advice. And `0` is always a _null pointer constant_ in pointer context in C. And most modern CPUs don't catch null-pointer dereferening. x86 and ARMv7A/8 are not the majority of CPUs. Already most ARMv7M CPUs (which outnumber the former by magnitudes) don't by default. Segfault is no guarantee by the standard. – too honest for this site Apr 09 '17 at 14:37
  • And that is why I did not say that all-bits-zero is guaranteed to be NULL or that the undefined behavior of dereferencing NULL is guaranteed to be a segfault. I said that *if* you are running in such an environment, you get that very useful behavior. Even if not, the fact that your garbage value is the same every time makes bugs reproducible when they might not be if you use `malloc()`. – Davislor Apr 09 '17 at 14:56
  • "like almost all modern CPUs" - That's not even true for x86 or the larger ARMs. It depends on the OS (and the capabilities of the CPU). Typically only high-end OS with virtual memory support provide this feature. And that is not a matter of the value `0`, but how accesses to the address it represents are treated. Even still some larger OS don't disallow such accesses. Using `malloc` or `calloc depends on many factors; if you frequently `calloc` some GiB, you start thinking. – too honest for this site Apr 09 '17 at 15:07
  • I’ve added a section at the end to address your concerns, and I’ll rephrase that line to placate you. – Davislor Apr 09 '17 at 15:26
  • You are aware, the question is about C and C++ is a different language? There is absolutely no reason to talk someone into using C++ without reason and it is not very well received here either. – too honest for this site Apr 09 '17 at 15:29
  • "What happens when you pass or return the name of an array is what computer scientists call pass-by-reference" - Sorry for being direct, but that is nonsense. it is called "decaying", i.e. the array decays to a pointer to the first element and that's a C-speciality, actually a legacy deeply woven into the language, so changing it would make almost all code invalid. And where did I criticise the word "return"? Anyway, I don't have time for this. Feel free to read the standard. Here's the [final draft](http://port70.net/~nsz/c/c11/n1570.html), which is identical in all relevant aspects. – too honest for this site Apr 09 '17 at 15:34
  • Of course. Your point, as I understood it, was that you only use the phrase “return an array” to mean something different that isn’t possible in C. Since it isn’t possible in C, I needed to go to another language in order to give an example of it. – Davislor Apr 09 '17 at 15:35
  • I apologize for not using certain words exactly the same way that you do. This seems like a good place to leave it. – Davislor Apr 09 '17 at 15:37
  • "you want to remember that an array is just an address in memory, plus a length. It’s treated interchangeably with the address of the first element." --> That is an oversimplification that applies in some situations, yet not all. An array is not an address and so this answer begins with a falsehood. Counter example: `char a[42]; int x = sizeof a;` and `char a[42]; int x = sizeof &a[0];` does not yield the same value for `x`. – chux - Reinstate Monica Apr 09 '17 at 19:12
  • Okay, `a` and `&a[0]` aren’t interchangeable in that context, because `&a[0]` is a dimensionless pointer. An array also has a length (and both pointers and arrays have an object type), and `sizeof(a) / sizeof(a[0])` is equal to the length of `a`. That’s one of the examples in the standard. I’ll add a qualifier to say in which context they are interchangeable. – Davislor Apr 09 '17 at 20:56
  • @chux That better? – Davislor Apr 09 '17 at 21:01