2

I'm having some trouble understanding the functionality of malloc and perhaps that's why I'm seeing this issue, but hopefully someone here can help me understand.

I am allocating a 2D array using the following function:

int16_t** create_2d_array(uint8_t num_rows, uint16_t num_cols){
    uint8_t i = 0;  

    int16_t **arr = (int16_t **)malloc(num_rows * sizeof(int16_t *));

    for(i=0; i<num_rows; i++) {
        arr[i] = (int16_t *)malloc(num_cols * sizeof(int16_t));
    }               

    return arr;     
}

I am calling this function like so:

twoD = create_2d_array(4, 512);

If I stop my code at the beginning of the for loop and check the contents of arr using my GDB terminal, then I get the following:

gdb $ p/x arr[0]
$96 = 0x2001fa00

gdb $ p/x arr[1]
$97 = 0x0

gdb $ p/x arr[2]
$98 = 0x0

gdb $ p/x arr[3]
$99 = 0x0

This implies, at least to me, that arr[0] was allocated properly, but not the other elements of arr.

Isn't malloc supposed to determine if the size requested can be allocated and if not, then it should return a NULL pointer? At least that's my understanding of malloc. Is there something I'm doing wrong?

As a test, I executed the following line:

twoD_temp = create_2d_array(2, 4);

Again I stop the execution at the beginning of the for loop and print the contents of arr.

gdb $ p/x arr[0]
$121 = 0x2001fa00

gdb $ p/x arr[1]
$122 = 0x2001fa10

This is what I would expect. First index is a valid pointer, and the second index is also a valid pointer since I created an array of pointers.

After the for loop executes I print the same contents:

gdb $ p/x arr[0]
$125 = 0x2001fa00

gdb $ p/x arr[1]
$126 = 0x2001fa10

This is still the same which is what I would expect. The only difference now is that there is memory allocated for the columns.

klutt
  • 30,332
  • 17
  • 55
  • 95
cDreamer
  • 385
  • 5
  • 18
  • 1
    They weren't allocated if you stopped on the first malloc. – tkausl Jul 08 '19 at 22:54
  • Where do you break the execution? after the first malloc in the function or the first malloc in the loop? Or after the loop? – Ajay Brahmakshatriya Jul 08 '19 at 22:54
  • 1
    FYI, your understanding of `malloc` doesn't apply in the presence of overcommit; on most default Linux setups, `malloc` always succeeds (unless you ask for so much memory that you can't allocate the virtual memory addresses in a contiguous block). When you actually *use* it, if the system discovers it can't satisfy the request, the OOM (out of memory) killer gets invoked to "fix" things. – ShadowRanger Jul 08 '19 at 22:59
  • @tkausl, I edited my question to better clarify where I stop my code. I stop it at the beginning of the for loop. – cDreamer Jul 08 '19 at 22:59
  • @AjayBrahmakshatriya, I stop it at the beginning of the for loop. – cDreamer Jul 08 '19 at 22:59
  • *Why* do you use these smallish types? Function arguments are basically promoted to native ints, and so are automatic variables (to register-sized units)[unless you are programming a 6502...] – wildplasser Jul 08 '19 at 23:02
  • 4
    Please don't cast return values from malloc in C, it can hide subtle problems and is totally unnecessary. – paxdiablo Jul 08 '19 at 23:07
  • PSA: [Don't cast the result of `malloc`](https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc). – tadman Jul 08 '19 at 23:10
  • It helps to `#include` at the least to reproduce your failing program. – wildplasser Jul 08 '19 at 23:52

4 Answers4

4

After just the first malloc, arr is a pointer to a chunk of memory that contains nothing but junk. The for loop sets the individual entries to point to the rows.

So neither arr[0] nor arr[1] should contain any particular value until the for loops sets their values to point to the various rows that it creates.

Let's look at the code carefully:

int16_t **arr = (int16_t **)malloc(num_rows * sizeof(int16_t *));

This allocates one block of memory, large enough to hold one pointer for each row. The variable arr will point to this memory. The block of memory contains junk.

for(i=0; i<num_rows; i++) {
    arr[i] = (int16_t *)malloc(num_cols * sizeof(int16_t));
}     

This sets arr[0] to point to a block of memory large enough to hold a row. Until this for loop executes, arr[0] and the other entries in the array just contain junk.


Perhaps a diagram will help. After the first allocation before the loop (of arr, the block of ponters), this is what you have:

       +--------+
arr -> | arr[0] | -> points to some arbitrary location
       +--------+
       | arr[1] | -> points to some arbitrary location
       +--------+
       | arr[2] | -> points to some arbitrary location
       +--------+
       | arr[3] | -> points to some arbitrary location
       +--------+

Those pointers will point to arbitrary locations because malloc allocates the memory but does not initialise it to anything.

And that's the state where you're examining everything, so each arr[] can be any value. Once you go through the loop once, you have:

       +--------+    +----------------+
arr -> | arr[0] | -> | arr[0][0..511] |
       +--------+    +----------------+
       | arr[1] | -> points to some arbitrary location
       +--------+
       | arr[2] | -> points to some arbitrary location
       +--------+
       | arr[3] | -> points to some arbitrary location
       +--------+

It's only at that point do your second level allocations begin to point at something useful.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • famous last words! – wildplasser Jul 08 '19 at 22:58
  • "Also, I don't think a uint8_t can count up to 512!" - lucky then that it only counts up to four in this case :-) Other than that throwaway line, a good answer. – paxdiablo Jul 08 '19 at 23:04
  • 2
    @paxdiablo Darn! I liked that line! – David Schwartz Jul 08 '19 at 23:06
  • I'm not sure I understand what you're saying here. I'm trying to allocate a 2D array, num_rows is a uint8_t type, because I'm only allocating 4 rows. num_cols is of type uint16_t because I'm allocating 512 columns. It's my understanding that by multiplying num_rows by the size of the pointer, I will be allocating an array of pointers that can fit 4 int16_t pointers. That's why I expect arr[0] to be a valid pointer, even if it is to junk, it's a valid pointer. What I don't understand is why arr[1] is not a valid pointer. – cDreamer Jul 08 '19 at 23:07
  • @cDreamer, see the second paragraph. Since you paused the code before executing the loop, the row allocations have not yet happened. So, while you have a allocated block for the row *pointers,* the row blocks themselves are not yet allocated - those pointers will therefore have arbitrary values. – paxdiablo Jul 08 '19 at 23:11
  • @cDreamer If `arr[0]` was already a valid pointer, why would you be changing its value in the `for` loop? When `i` is zero, you're doing `arr[0] = ...`. No entry in the `arr` array has any particular value until you execute the `for` loop. Only `arr` itself points to anything -- to the one and only block you allocated at that point. – David Schwartz Jul 08 '19 at 23:12
  • @paxdiablo, if after the first malloc call i create a chunk of memory large enough to fit 4 int16_t pointers then I should be able to call arr[0] to see the starting location, arr[1] should show me the second pointer, and so on. Keep in mind that this is a double pointer, so the first malloc call is creating an array of pointers. Keyword here being "array". The for loop then creates the necessary columns which will actually point to the values themselves. So I guess my question is still, if I am creating an array of pointers, then why does only the first index have a valid pointer? – cDreamer Jul 08 '19 at 23:27
  • @cDreamer No. You only allocated one chunk of memory and it contains nothing in particular. All you have is one pointer to that one chunk of memory, and that's `arr`. The chunk is large enough to hold 4 pointers, but it doesn't hold any pointers at all yet because you haven't stored any pointers in that chunk. You allocated space for an array of pointers but have not yet stored any pointers in that space -- the space just contains random junk. You are *about* to set `arr[0]` to point to something (a row) but haven't done it yet. – David Schwartz Jul 08 '19 at 23:30
  • @DavidSchwartz, I just updated my question to show a test I ran. If what you're saying is true, then my test should not have given me the result I got. Unless I'm still missing something here. – cDreamer Jul 08 '19 at 23:44
  • @cDreamer If what I'm saying is true, *any* results from your test would be reasonable. The block of memory you allocated can hold anything since you've not yet done anything to make it hold anything in particular. Also, you say "since I created an array of pointers". This is not true. You simply allocated a chunk of memory that is large enough to hold a few pointers. – David Schwartz Jul 08 '19 at 23:46
  • 1
    @cDreamer (and David, hope that's okay), I've added some diagrams to hopefully explain better what's happening. – paxdiablo Jul 09 '19 at 01:37
  • 1
    @paxdiablo, that diagram actually really cleared it up for me. Thank you very much! Now I understand it. The first malloc call creates a chunk of memory and points arr to it. Then each row has to be pointed to a different chunk of memory that is then allocated within the for loop. Thanks a lot for your help guys! – cDreamer Jul 09 '19 at 01:40
  • 1
    @paxdiablo, actually I just realized why I thought arr[0] was pointing to a valid location before going through the for loop. I hadn't cleared the RAM yet. Once I power cycled the board and ran through the code again and stopped at the beginning of the for loop, I examined arr[0] and realized it was pointing to some crazy location outside of RAM. Like you said, some arbitrary value. – cDreamer Jul 09 '19 at 01:43
  • @cDreamer: You are mixing `arr` with `arr[0]` – Gerhardh Jul 09 '19 at 07:20
0

Your code doesn't have a problem. The issue is your understanding of how malloc works.

When memory is allocated dynamically via malloc, the values of the bytes are indeterminate, so they could take any value including 0. They are not guaranteed to contain any particular value.

In your example:

int16_t **arr = (int16_t **)malloc(num_rows * sizeof(int16_t *));

If we assume a pointer takes up 8 bytes and that num_rows is 4, this allocates 32 bytes of space, enough for 4 values of type int16_t *. At this point each member of the array contains no meaningful value. When you later do this:

arr[i] = (int16_t *)malloc(num_cols * sizeof(int16_t));

You assign (assuming a successful call) a valid memory address returned from malloc to a member of the array, overwriting whatever garbage value happened to be there previously. If you were to look at the elements of arr[i] at this point, i.e. arry[i][0], arry[i][1] etc. these values are also indeterminate until you assign something to them.

dbush
  • 205,898
  • 23
  • 218
  • 273
0

You've already accepted an answer (and rightly so since it explains in detail what was wrong). I know this because I contributed to it :-)

However, I'd also like to suggest another method of allocating your 2D arrays. Doing it as a series of allocations means that you are responsible for cleaning up the individual allocations when you're done with the array.

Although this can be done with another function, you have to pass the number of rows into that so you can correctly free each row allocation before freeing the pointer array allocation.

A method I've used in the past is to allocate one block of memory big enough for both the pointer array and all the row arrays, then massage it so that it looks like a normal 2D array (so that things like array[row][col] = 7 still work).

The code below does this and the only requirement is that the alignment requirements for int16_t are not more restrictive than that of int16_t* (this would be rare and you could work around it, but it's probably not necessary in the vast majority of environments):

#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <malloc.h>

void *create_2d_array(uint8_t num_rows, uint16_t num_cols, int clear_data) {
    // Create a single block big enough for both level-1 pointer array
    // and level-2 value array.

    size_t totalMemSz =
        (num_rows * sizeof(uint16_t*)) +            // pointers.
        (num_rows * num_cols * sizeof(uint16_t));   // values.
    void *memBlock = clear_data ? calloc(totalMemSz, 1) : malloc(totalMemSz);
    if (memBlock == NULL) return NULL;

    // Populate the level-1 pointers to point at the level-2 rows.

    for (size_t i = 0; i < num_rows; ++i) {
        ((int16_t**)memBlock)[i] = (int16_t*)(&(((char*)memBlock)[
            num_rows * sizeof(uint16_t*) +    // skip pointers.
            i * num_cols * sizeof(uint16_t)   // select row.
        ]));
    }

    return memBlock;
}

#include <stdio.h>
#include <stdbool.h>

void dumpArr(int16_t **addr, size_t num_rows, size_t num_cols) {
    for (size_t i = 0; i < num_rows; ++i) {
        printf("Pointer[%zd] = %p, data =", i, addr[i]);
        for (size_t j = 0; j < num_cols; ++j) {
            printf(" %2d", addr[i][j]);
        }
        putchar('\n');
    }
}

int main() {
    puts("Running ...");
    int16_t **arr = create_2d_array(4, 7, true);
    arr[0][0] = 1;
    arr[1][2] = 42;
    arr[3][6] = 77;
    dumpArr(arr, 4, 7);
    free(arr);
}

That code is a complete test program for the create_2d_array function so you can test it how you wish. The important thing is that, when you're finished with the 2D array, you simply release it with a free(arr) rather than having to do any special size-dependent processing.

A sample run of the code as it currently stands:

Running ...
Pointer[0] = 0x675440, data =  1  0  0  0  0  0  0
Pointer[1] = 0x67544e, data =  0  0 42  0  0  0  0
Pointer[2] = 0x67545c, data =  0  0  0  0  0  0  0
Pointer[3] = 0x67546a, data =  0  0  0  0  0  0 77
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
-1

There's nothing that's really wrong with your code. I have a few remarks that I will take below.

Isn't malloc supposed to determine if the size requested can be allocated and if not, then it should return a NULL pointer?

Yes, that is true. But you're not testing any return values, so how would you know?

Also, it's a good habit to use the variable instead of the type for sizeof. That reduces code duplication. So write T *t= malloc(n * sizeof(*t)) instead of T *t = malloc(n * sizeof(T)). And casting is completely unnecessary and provides not benefits, unless you are compiling your C code with a C++ compiler.

So with all that in mind, I would write the code like this:

int16_t** create_2d_array(uint8_t num_rows, uint16_t num_cols)
{
    uint8_t i = 0;  

    int16_t **arr = malloc(num_rows * sizeof(*arr));
    if(!arr) {
        fprintf(stderr, "Error allocating memory\n");
        exit(EXIT_FAILURE);
    }    
    for(i=0; i<num_rows; i++) {
        arr[i] = malloc(num_cols * sizeof(*arr[0]));
        if(!arr[i]) {
            fprintf(stderr, "Error allocating memory\n");
            exit(EXIT_FAILURE);
        }    
    }        

    return arr;     
}

And if you want to go really hardcore and be able to test your own function the same way you test malloc, then you can do like this. It's one of the few accepted uses for goto

int16_t** create_2d_array(uint8_t num_rows, uint16_t num_cols)
{
    uint8_t i = 0;

    int16_t **arr = malloc(num_rows * sizeof(*arr));
    if(!arr)
        goto cleanup1;

    for(i=0; i<num_rows; i++) {
        arr[i] = malloc(num_cols * sizeof(*arr[0]));
        if(!arr[i])
            goto cleanup2;
    }

    return arr;

cleanup2:
    do {
        free(arr[i]);
    } while(i-- > 0);

cleanup1:
    free(arr);

    return NULL;
}

Then you can do this:

int16_t ** array2d = create_2d_array(5,6);
if(!array2d) { /* Handle error */ }
klutt
  • 30,332
  • 17
  • 55
  • 95