1

I want to create 5*5 2D Matrix. I usually use the following way of memory allocation:

int **M  = malloc(5 * sizeof(int *));
for (i = 0; i < 5; i++)
{
    M[i] =  malloc(5 * sizeof(int));
}

While I was reading a blog, I found also another way to do that:

int **M = malloc(5 * sizeof(int*));
M[0] = malloc((5*5) * sizeof(int));

My question is: What is the difference between both methods? Which one in more efficient?

Mike
  • 380
  • 4
  • 19
  • 3
    The second snippet is incomplete, because it leaves `M[1]` thru `M[4]` uninitialized. – user3386109 Nov 02 '15 at 21:33
  • 3
    The first one is more "efficient" in the sense that it will do what you actually want (allocate space for 5 `int` values for each `M[i]`). The second allocates space for 25 `int` values for `M[0]`, leaves `M[1]` through `M[4]` uninitialized. Which blog did you find this in? – John Bode Nov 02 '15 at 21:33
  • 2
    The difference is that the first one works and the second one is a bug. – Crashworks Nov 02 '15 at 21:36

6 Answers6

5

For the second code, note that you need to initialize the other array members for it to work correctly:

    for (int i = 1; i < 5; i++) {
        M[i] = M[0] + i * 5;
    }

So in the second code the arrays members (through all arrays) are contiguous. It does not make any difference to access them (e.g., you an still access them using M[i][j] syntax). It has the advantage over the first code to require only two malloc calls and as mentioned in the comments to favor caching which can greatly improve the access performances.

But if you plan to dynamically allocate large arrays, it is better to use the first method because of memory fragmentation (large contiguous memory allocation can be not available or can exacerbate memory fragmentation).

A similar example of this kind of dynamic allocation of arrays of arrays can be found in the c-faq: http://c-faq.com/aryptr/dynmuldimary.html

ouah
  • 142,963
  • 15
  • 272
  • 331
  • In the second case, what would happen if you tried to access `M[1][1]`? – John Bode Nov 02 '15 at 21:35
  • 3
    -1; the second case doesn't do the right thing. Additionally, if it did do the "right" thing, which would be to assign `M[1] = &M[0][5]` etc. as well, it definitely *does* make a difference in terms of performance due to caching. – MooseBoys Nov 02 '15 at 21:36
  • @ouah: yuck. I mean, yeah, it can work, but ... yuck. That assumes the OP followed through on the second half of that technique (assigning `M[1]` through `M[4]`. – John Bode Nov 02 '15 at 21:39
  • 2
    @JohnBode: I'm not understanding what is so "yuck" about wanting contiguous memory... – R_Kapp Nov 02 '15 at 21:40
  • @R_Kapp the "yuck" part is having this waste of space to create a bunch of pointers into contiguous memory. By all means use contiguous memory, but not via this technique. – M.M Nov 02 '15 at 21:55
  • @M.M: If you have something that is conceptually a matrix, and you want your code to reflect that (i.e., you want to be able to access `M[i][j]` instead of `M[i * 5 + j]` after declaring a contiguous array of 25), what other options do you have? – R_Kapp Nov 02 '15 at 21:59
  • 2
    @R_Kapp: `T (*arr)[N] = malloc( sizeof *arr * M );`. Contiguous, no need for manually assigning additional pointers. – John Bode Nov 02 '15 at 22:00
  • @JohnBode: Hm... did not know about that syntax before. Learn something new every day, I guess. – R_Kapp Nov 02 '15 at 22:01
5

After seeing ouah's answer and seeing the example in the C FAQ, I now understand where the second technique comes from, although I personally wouldn't use it where I could help it.

The main problem with the first approach you show is that the rows in the array are not guaranteed to be adjacent in memory; IOW, the object immediately following M[0][4] is not necessarily M[1][0]. If two rows are allocated from different pages, that could degrade runtime performance.

The second approach guarantees that all the rows will be allocated contiguously, but you have to manually assign M[1] through M[4] to get the normal M[i][j] subscripting to work, as in

for ( size_t i = 0; i < 5; i++ )
  M[i] = M[i-1] + 5;

IMO it's a clumsy approach compared to the following:

int (*M)[5] = malloc( sizeof *M * 5 );

This also guarantees that the memory is allocated contiguously, and the M[i][j] subscripting works without any further effort.

However, there is a drawback; on compilers that don't support variable-length arrays, the array size must be known at compile time. Unless your compiler supports VLAs, you can't do something like

size_t cols;
...
int (*M)[cols] = malloc( sizeof *M * rows );

In that case, the M[0] = malloc( rows * cols * sizeof *M[0]) followed by manually assigning M[1] through M[rows - 1] would be a reasonable substitute.

John Bode
  • 119,563
  • 19
  • 122
  • 198
  • I wish I had seen this before posting. This is the correct answer. – user3386109 Nov 02 '15 at 22:01
  • @ouah: No, it's a *pointer* to a VLA. The array itself will be allocated from the heap. – John Bode Nov 02 '15 at 22:02
  • 1
    @ouah: right, which is the one drawback of this technique; if you don't have VLA support, the size of the outer dimension must be specified at compile time. Or you use a different technique. – John Bode Nov 02 '15 at 22:05
  • @JohnBode Actually I do think on C99 (and C versions after) that's the best solution to dynamically allocate a contiguous array of arrays. – ouah Nov 02 '15 at 22:10
  • The good news is that all compilers except MSVC do have VLA support – M.M Nov 02 '15 at 22:13
  • @M.M: As of the 2011 standard, VLA support is now optional, so it may not be universally supported going forward. – John Bode Nov 02 '15 at 22:14
  • @ouah: Thanks, fixed - got `malloc` and `calloc` wires crossed, I guess. – John Bode Nov 02 '15 at 22:15
  • Technically `sizeof *M` causes undefined behaviour in your last example ([see this thread](http://stackoverflow.com/questions/32985424/is-the-operand-of-sizeof-evaluated-with-a-vla)) so you could use `sizeof(int[rows][cols])` instead. – M.M Nov 02 '15 at 22:17
  • As far as I know malloc doesn't guarantee physically contiguous memory, it is contiguous in virtual memory. So I cannot see the point here. Am I wrong? – terence hill Nov 02 '15 at 23:04
  • @M.M: I agree with Keith's position in that thread - this is a defect in the standard. – John Bode Nov 03 '15 at 12:52
4

I hope I'm not missing something here but here's my attempt to answer the question "What is the difference...". If I am completely off base, forgive me and I will correct my answer but here goes:

I tried drawing out what is happening in your two mallocs so what I have to say is tied to the picture included which I drew by hand (hand crafted answers?)

First option:

For the first option, you allocate a memory block the size of 5 int*s. M, which is an int** points to the start of that memory block.

Then, you go over each of the memory blocks (the size of int*) and in each block you put in the address of a memory block the size of 5 ints. Note that these are located in some random portion of your memory (the heap) that has enough space to take the size of 5 ints.

This is the key - it's a noncontiguous block of memory. So if you think about memory as an array, you are pointing at different start locations in the array.

Second Option

Your second does the allocation of int** exactly the same. But instead, it allocates the size of 25 ints and returns places the address of that array in the memory block M[0]. Note: you've never placed any address in the memory locations M[1] - M[4].

So, what happens? You have a contiguous block of 25 ints with an address that can be found in M[0]. What happens when you try getting M[1]? You guessed it - it's empty or contains junk values. Even more, it's a value that does not point to an allocated memory space so you Segfault.

enter image description here

fpes
  • 964
  • 11
  • 22
3

If you want to allocate a 5x5 array in contiguous memory, the correct approach would be

int rows = 5;
int cols = 5;
int (*M)[cols] = malloc(rows * sizeof(*M));

You can then access the array with normal array indexing, e.g.

M[3][2] = 6;
user3386109
  • 34,287
  • 7
  • 49
  • 68
1

int **M = malloc(5 * sizeof(int *)); refers to allocating memory for a pointer M[i] = malloc(5 * sizeof(int)); refers to allocating memory for a variable of int.

Maybe this will help you understand what is going on:

int **M  = malloc(5 * sizeof(void *));
/* size of 'void *' and size of 'int *' are the same */
for (i = 0; i < 5; i++)
{
    M[i] =  malloc(5 * sizeof(int));
}
  • To expand on this, the second code example provided by OP only initialized bucket 0 of M. The code that @GRC provided will allocate buckets M[0] to M[4]. – kyle Nov 02 '15 at 21:49
0

Another little difference when using malloc((5*5) * sizeof(int));. Certainly a side issue to what OP is looking for, but still a concern.

Both of the below are the same as the order of the 2 operands still result in using size_t math for the product.

#define N 5
malloc(N * sizeof(int));
malloc(sizeof(int) * N);

Consider:

#define N some_large_value
malloc((N*N) * sizeof(int));

The type of the result of sizeof() is type size_t, an unsigned integer type, that is certainly has SIZE_MAX >= INT_MAX, possible far larger. so to avoid int overflow that does not overflow size_t math use

malloc(sizeof(int) * N * N);
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256