Return a malloc’ed matrix while being able to use subscript notation

Question

I have an exercise where I am supposed to use fixed-size arrays and in/out parameters to do stuff on matrices (add, scanf, print, etc.), but I’d like to do it on arbitrary-length matrices and return them rather than adding each time more (in/)out parameters (thus possibly allowing a more “functional” style).

Since I want to return them, I suppose I probably need malloc to keep the array in memory passed the function scope. Since I want to use multidimensional subscript notation (mat[x][y] rather than mat[x*len+y] or mat+x*len+y) I guess I should use some kind of vla or casting… yet it seems cast to array is forbidden (but I’m going to often return pointers, and how to use subscript notation on them if I can’t cast?), and I visibly “may not initialize a variable-sized object” as says the compiler (even if it’s not directly an array but a pointer to an array), like using this notation:

int *tab[x][y]=malloc(x*y*sizeof(int));

I also get “invalid initializer” if I replace x and y with constant values like 3 by hand.

I passed almost a week searching and maybe that’s impossible and I should just move forward… I also found this notation, which to me looks like function-pointer notation, unless it is a way to prioritize the * operator…

int (*tab)[x][y]=malloc(x*y*sizeof(int));

However I’m not totally sure to understand this notation as then get random values from printed/filled arrays with this way.

Previously I’ve tried to use VLAs (variable length arrays) and GNU extension for giving array lengths as parameter:

void
printMat (int h, int w; int tab[h][w], int h, int w)
{
   [code using tab[x][y]]
}

but I soon realized I needed to treat with pointers and malloc anyway for a “add” function adding two matrices and returning a pointer to a new malloc’ed matrix anyway…

I’d especially like to know, in case I wasn’t specific enough, how should I declare arguments and return type in order to be able to use them as multidimensional arrays without having to use an intermediary variable, while actually passing a pointer (anyway that’s already what’s passing a normal multidimensional array as parameter do right?)

Okay after many tests and tries, it now works as I intended, even if I’m not sure to have understood everything exactely well, especially on what’s a pointer and what’s not (I maybe confused myself by trying to figure out with gdb this, I should probably investigate further on if a normal uni- or multidimensional array is considered as an address or not by gdb, etc.), and as today I’ve not got my sleep/rest and concentration at its best.

Now, I’d like a proper answer to the second part of my initial question: how to return? is there a proper generic type (other than meaningless void*) which may be apropriated for a pointer to a 2-dimensional array (like int(*)[][] but that would work?)? if too generic, what’s the proper way to cast the returned pointer so I can use multidimensional subscript notation on it? is (int(*)[3][3]) correct?

However, if I get nothing satisfactory for this (a justified-enough “it’s impossible in C” is fine I guess), I’ll set @JohnBod current answer as solving the problem, as he gave confirmation for multidimensional vla malloc via a complete and explicative answer on multidimensional arrays, answering fully the first part of question, and gave several answers on the path to the second (if there is any).

#include <stdio.h>
#include <stdlib.h>

void
print_mat (int x, int y; int mat[x][y], int x, int y)
{
  for (int i = 0; i < x; i++)
    {
      for (int j=0; j < y ; j++)
        printf("%d ", mat[i][j]);
      putchar('\n');
    }
  putchar('\n');
}

void*
scan_mat (int x, int y)
{
  int (*mat)[x][y]=malloc(sizeof(*mat));
  for (int i = 0; i < x ; i++)
    for (int j = 0; j < y; j++)
      {
        printf("[%d][%d] = ", i, j);
        scanf("%d", &((*mat)[i][j]));
      }
  return mat;
}

void*
add_mat (int x, int y; int mat1[x][y], int mat2[x][y], int x, int y)
{
  int (*mat)[x][y]=malloc(*mat);
  #pragma GCC ivdep
  for (int i = 0; i < x ; i++)
    for (int j = 0; j < y; j++)
      (*mat)[i][j]=mat1[i][j]+mat2[i][j];
  return mat;
}

int
main ()
{
  int mat1[3][3] = {1, 2, 3,
                    4, 5, 6,
                    7, 8, 9},
    (*mat2)[3][3] = scan_mat(3, 3);
  print_mat(mat1, 3, 3);
  print_mat(*mat2, 3, 3);
  print_mat((int(*)[3][3])add_mat(mat1, *mat2, 3, 3), 3, 3); // both appears to work… array decay?
  print_mat(*(int(*)[3][3])add_mat(mat1, *mat2, 3, 3), 3, 3);
  printf("%d\n", (*(int(*)[3][3])add_mat(mat1, *mat2, 3, 3))[2][2]);
  return 0;
}

and the input/output:

[0][0] = 1
[0][1] = 1
[0][2] = 1
[1][0] = 1
[1][1] = 1
[1][2] = 1
[2][0] = 1
[2][1] = 1
[2][2] = 1
1 2 3 
4 5 6 
7 8 9 

1 1 1 
1 1 1 
1 1 1 

2 3 4 
5 6 7 
8 9 10 

2 3 4 
5 6 7 
8 9 10 

10

`int *tab[x][y]` makes `tab` be an array of `x` arrays of `y` pointers to `int`. `int (*tab)[x][y]` makes `tab` be a pointer to an array of `x` arrays of `y` `int` elements. — Some programmer dude, Mar 19 '18 at 13:53
Possible duplicate: [Correctly allocating multi-dimensional arrays](https://stackoverflow.com/questions/42094465/correctly-allocating-multi-dimensional-arrays). — Lundin, Mar 19 '18 at 13:57
@JonathanLeffler: except the fact before C99 they were in fact a GNU extension (in gnu89), I was refering to the backdeclaration of variables, so that you can have arguments defining the size of an array *after* this array in argument order, by respecifying them before a semi-colon: `fun (int x, int y; array[x][y], int x, int y)`, which is called like `fun (array, 3 3)` instead of the normal standard thing which is `fun (int x, int y, array[x][y])` and `fun(3, 3, array)`, made mandatory by the necessary previous specification of array size arguments. — galex-713, Mar 19 '18 at 15:03
@JonathanLeffler: not anymore no, but I never said VLA were currently a GNU extension, I said using that `;` so to declare following arguments *before* so you can then use them in a *preceding* array type declaration was. — galex-713, Mar 19 '18 at 15:31
Your question is an unusual mixture of naïveté and sophistication. The puzzlement over `int *array[x][y];` is on the naïve end; it is an array of pointers to integers and you must use `int (*array)[x][y]` to get a pointer to an array. That's a non-negotiable consequence of the rules of C type formation and operator precedence. Then you delve into the intricacies of arcane and archaic GNU C extensions. You confused me — I'm sorry. — Jonathan Leffler, Mar 19 '18 at 15:36

John Bode · Accepted Answer · 2018-03-19T14:33:31.380

6

If you want to allocate a buffer of type T, the typical procedure is

T *ptr = malloc( sizeof *ptr * N ); // sizeof *ptr == sizeof (T)

You're allocating enough space for N elements of type T.

Now let's replace T with an array type, R [M]:

R (*ptr)[M] = malloc( sizeof *ptr * N  ); // sizeof *ptr == sizeof (R [M])

You're allocating enough space for N elements of type R [M] - IOW, you've just allocated enough space for an N by M array of R. Note that the semantics are exactly the same as for the array of T above; all that's changed is the type of ptr.

Applying that to your example:

int (*tab)[y] = malloc( sizeof *tab * x );

You can then index tab as you would any 2D array:

tab[x][y] = new_value();

Edit

Answering the comment:

yet, still, I’m not sure to understand: what’s the meaning of the “(*tab)” syntax? it’s not a function pointer I guess, but why wouldn’t *tab without parenthesis work: what’s the actual different meaning? why doesn’t it work and what does change then?

The subscript [] and function call () operators have higher precedence than unary *, so a declaration like

int *a[N];

is parsed as

int *(a[N]);

and declares a as an array of pointers to int. To declare a pointer to an array, you must explicitly group the * operator with the identifier, like so:

int (*a)[N];

This declares a as a pointer to an array of int. The same rule applies to function declarations. Here's a handy summary:

T *a[N];    // a is an N-element array of pointers to T
T (*a)[N];  // a is a pointer to an N-element array of T
T *f();     // f is a function returning pointer to T
T (*f)();   // f is a pointer to a function returning T

In your code,

int *tab[x][y]=malloc(x*y*sizeof(int));

declares tab as a 2D array of pointers, not as a pointer to a 2D array, and a call to malloc(...) is not a valid initializer for a 2D array object.

The syntax

int (*tab)[x][y]=malloc(x*y*sizeof(int));

declares tab as a pointer to a 2D array, and a call to malloc is a valid initializer for it.

But...

With this declaration, you'll have to explicitly dereference tab before indexing into it, like so:

(*tab)[i][j] = some_value();

You're not indexing into tab, you're indexing into what tab points to.

Remember that in C, declaration mimics use - the structure of a declarator in a declaration matches how it will look in the executable code. If you have a pointer to an int and you want to access the pointed-to value, you use the unary * operator:

x = *ptr;

The type of the expression *ptr is int, so the declaration of ptr is written

int *ptr;

Same thing for arrays, if the ith element of an array has type int, then the expression arr[i] has type int, and thus the declaration of arr is written as

int arr[N];

Thus, if you declare tab as

int (*tab)[x][y] = ...;

then to index into it, you must write

(*tab)[i][j] = ...;

The method I showed avoids this. Remember that the array subscript operation a[i] is defined as *(a + i) - given an address a, offset i elements (not bytes!) from a and dereference the result. Thus, the following relationship holds:

*a == *(a + 0) == a[0]

This is why you can use the [] operator on a pointer expression as well as an array expression. If you allocate a buffer as

T *p = malloc( sizeof *p * N );

you can access each element as p[i].

So, given a declaration like

T (*a)[M];

we have the relationship

 (*a)[i] == (*(a + 0))[i] == (a[0])[i] == a[0][i];

Thus, if we allocate the array as

T (*a)[M] = malloc( sizeof *a * N );

then we can index each element of a as

a[i][j] = some_value();

edited Mar 19 '18 at 14:33

answered Mar 19 '18 at 13:53

John Bode

119,563
19
122
198

why omitting parenthesis after `sizeof`? that confuses me a little as I’m not good at remembering operator priority and I considered sizeof as a macro/function for a long time… – galex-713 Mar 19 '18 at 13:58
`sizeof *tab * x` looks a bit cryptic, especially to beginners. It might be easier to read the code if you use this alternative style instead: `int (*tab)[y] = malloc( sizeof(int[x][y]) );` – Lundin Mar 19 '18 at 13:59
@galex-713 When `sizeof` is applied to an expression, rather than a pure type such as `int`, you can omit the parenthesis (or you can keep it, it does no harm). For an alternative style, see my comment above this one. – Lundin Mar 19 '18 at 14:00
@galex-713: `sizeof` is an operator, not a function. If the operand is a type name, then parentheses are required - `sizeof (int)`, `sizeof (double)`, etc. If the operand is an expression, then parentheses are not required - `sizeof x`, `sizeof *ptr`, etc. – John Bode Mar 19 '18 at 14:01
yet, still, I’m not sure to understand: what’s the meaning of the “(*tab)” syntax? it’s not a function pointer I guess, but why wouldn’t `*tab` without parenthesis work: what’s the actual different meaning? why doesn’t it work and what does change then? – galex-713 Mar 19 '18 at 14:03
@Lundin: C *is* cryptic for beginners; there's no getting around it. And like I said in the answer, the style is identical to allocating a 1D array. I'd rather emphasize a consistent style that can be used everywhere, even if it isn't immediately obvious at first. – John Bode Mar 19 '18 at 14:04
What’s the type of` int (*tab)[y]` then? it’s not `int[x][y]` as it’s a pointer and I can’t return anything not being a pointer there. Is it `int(*)[x][y]`? is that allowed to be returned? and how do I specify this as parameter? `fun (int x, int y; int (*tab)[x][y], int x, int y)` would work? is that an actual vla pointer? – galex-713 Mar 19 '18 at 14:12
I tried this and it still return garbag, I’m probably doing something incorrect with parameters and return values, cf initial post. – galex-713 Mar 19 '18 at 14:21
Nice explanation of the `int (*tab)[y] = malloc( sizeof *tab * x );` syntax. – cmaster - reinstate monica Mar 19 '18 at 14:45
1

@galex-713 `int (*tab)[y]` is indeed a pointer. However, you need `y` to be in scope when you write down its type. This is why it's not possible to return it directly from a function. The usual way is to use an output parameter, which is a pointer to the pointer that you want to return: `void foo(int x, int y, int (**outTab)[y]) { ... *outTab = tab = malloc(x*sizeof*tab); }` Note the two stars at the `outTab` argument: You want the caller to pass the address where your function can return the pointer. Call it with `int (*tab)[y]; foo(x, y, &tab);` – cmaster - reinstate monica Mar 19 '18 at 14:51
1

@galex-713: Since parentheses are always permitted with `sizeof`, I always use them. The 'sometimes not required' rule is real but just complicates life wholly unnecessarily — in my opinion. There are those who vehemently disagree with me on the topic. – Jonathan Leffler Mar 19 '18 at 14:52
@cmaster: the question specified I wanted to return a pointer and being able to use it as a multidimensional matrix, without an intermediary variable, be it with a cast or some strange type or typedef… but not with an in/out parameter. Do you mean it’s impossible? – galex-713 Mar 19 '18 at 14:56
@galex-713: Not impossible, just far more of a pain in the ass than it's worth. A pointer to an `N`-element array is a different, incompatible type from a pointer to an `M`-element array where `N` != `M`. The casting gymnastics you'd have to go through are painful and easy to get wrong. – John Bode Mar 19 '18 at 15:02
@galex-713 Not impossible, you can still do `void* foo(...) { return tab; }`, and then, at the calling site `int (*tab)[y] = foo(...);`. But you'd loose all type safety with that: **Any mismatch of the array sizes or element type will invoke UB without any warning**. – cmaster - reinstate monica Mar 19 '18 at 15:32
@cmaster: what is UB? and isn’t there any generic “pointer to array” type? and/or some proper way to correctly cast the returned value? – galex-713 Mar 19 '18 at 15:37
@galex-713: See my comment above. There is no generic "pointer to array" type. Arrays of different sizes are different types (i.e., `int a[10]` is a different type from `int a[11]`), so *pointers* to arrays of different sizes are also different types (`int (*)[10]` is a different type from `int (*)[11]`). Like cmaster says, you can return a `void *`, but as soon as you do that you throw what little type safety you had out the window and into incoming traffic, and the potential for mayhem skyrockets. It's *really easy* to get it wrong, so it's not recommended. – John Bode Mar 19 '18 at 15:41
@JohnBode: so you mean, since x and y (my two dimensions sizes) are unavailable in the scope of the function returning type declaration, if I want to return a pointer to a matrix, I will anyway not have any proper type nor type safety, so there’s no way to do this in C that is recommanded? I considered this okay as anyway the matrix has always same size of arguments in my convention… – galex-713 Mar 19 '18 at 15:46
@galex-713: That's the case, yes. – John Bode Mar 19 '18 at 15:48
@galex-713 UB stands for Undefined Behavior. It's when the C/C++ standard says that the compiled program is allowed to do anything the compiler wishes, irrespective of what the programmer said. If a program invokes undefined behavior, it may make pink elephants appear, as far as the standard is concerned. In reality, it could crash, format your hard drive, or, even worse, just continue as if nothing happened, but produce twisted results. In the context of C/C++, undefined behavior is such an important concept, that we usually just say UB here on stack-overflow, and everybody knows: Ugh, baaad! – cmaster - reinstate monica Mar 19 '18 at 16:15
Funny thing regarding the `sizeof` vs `sizeof()` style. I asked a "C beginner" who "only" has 5 years of full-time C programming experience about it and he wasn't aware of this oddity. But of course 5 years is nothing, he is just a rookie. C programming in a nutshell. Personally I only ever write `sizeof false <:ptr??)*N` and everyone who can't understand that line should go program VB at McDonalds instead. – Lundin Mar 19 '18 at 16:56
@Lundin: For me and my middle-aged eyes, it's all about reducing visual clutter; if the parens aren't strictly necessary, I don't use them; they introduce too much visual noise and make code harder for me to scan. Same reason I add spaces between parens and arguments - it helps me see things more clearly. If I were on a team with a coding standard that forbade those practices, I'd follow it (bitching the whole time), but for my own code, that's how I roll. – John Bode Mar 19 '18 at 17:14

Return a malloc’ed matrix while being able to use subscript notation

1 Answers1