Dynamically allocate contiguous memory for a "rectangular 2d array", without using VLAs

Question

This is one of those questions where there are so many answers, and yet none do the specific thing.

I tried to look at all of these posts — 1 2 3 4 5 6 7 8 9 — and every time the solution would be either using VLAs, using normal arrays with fixed dimensions, or using pointer to pointer.

What I want is to allocate:

dynamically (using a variable set at runtime)
rectangular ("2d array") (I don't need a jagged one. And I guess it would be impossible to do it anyway.)
contiguous memory (in #8 and some other posts, people say that pointer to pointer is bad because of heap stuff and fragmentation)
no VLAs (I heard they are the devil and to always avoid them and not to talk to people who suggest using them in any scenario).

So please, if there is a post I skipped, or didn't read thoroughly enough, that fulfils these requirements, point me to it.
Otherwise, I would ask of you to educate me about this and tell me if this is possible, and if so, how to do it.

Just use malloc/realloc. Use `array[j*cols + i]` to index `i,j` — stark, Nov 26 '22 at 17:19
*I heard they are the devil and to always avoid them and not to talk to people who suggest using them in any scenario* Keep listening to those people and pretty soon you'll also be burning witches. Do those same people also tell you not to use fixed-size arrays because `char myBigArray[ULLONG_MAX];` won't fit on the stack? That's logically no different than misusing VLAs as local variables without size checks. If you can't competently use VLAs, I'd question your competence in C in general. — Andrew Henle, Nov 26 '22 at 18:46
VLAs are not all bad. In fact, they're very good as arguments to functions or when dynamically allocated via `malloc()` et al. Where they are risky is when you make an allocation on the stack. Then you need to be sure that there is enough space left on the stack for the space you want to allocate. VLAs were added to C99 (as a mandatory feature). In C11 and C17, they were optional. In C23, some support for them will become mandatory — the good bits that don't cause trouble (so being able to pass VLA parameters to functions, and dynamically allocating them for use) are required once more. — Jonathan Leffler, Nov 26 '22 at 18:49

score 5 · Answer 1 · answered Nov 26 '22 at 17:51

5

You can dynamically allocate a contiguous 2D array as

int (*arr)[cols] = malloc( rows * sizeof (int [cols]) );

and then access elements as arr[i][j].

If your compiler doesn’t support VLAs, then cols will have to be a constant expression.

answered Nov 26 '22 at 17:51

John Bode

119,563
19
122
198

`sizeof (int [rows][cols]) )` would be work as well – tstanisl Nov 26 '22 at 20:10

John Bollinger · Answer 2 · 2022-11-27T13:53:55.040

Preliminary

no VLAs (I heard they are the devil and to always avoid them and not to talk to people who suggest using them in any scenario.

VLAs are a problem if you want your code to work with Microsoft's stunted C compiler, as MS has steadfastly refused to implement VLA support, even when C99, in which VLA support was mandatory, was the current language standard. Generally speaking, I would suggest avoiding Microsoft's C compiler altogether if you can, but I will stop well short of suggesting the avoidance of people who advise you differently.

VLAs are also a potential problem when you declare an automatic object of VLA type, without managing the maximum dimension. Especially so when the dimension comes from user input. This produces a risk of program crash that is hard to test or mitigate at development time, except by avoiding the situation in the first place.

But it is at best overly dramatic to call VLAs "the devil", and I propose that anyone who actually told you "not to talk to people who suggest using them in any scenario" must not have trusted you to understand the issues involved or to evaluate them for yourself. In particular, pointers to VLAs are a fine way to address all your points besides "no VLAs", and they have no particular technical issues other than lack of support by (mostly) Microsoft. Support for these will be mandatory again in C2X, the next C language specification, though support for some other forms of VLA use will remain optional.

Your requirements

If any of the dimensions of an array type are not given by integer constant expressions, then that type is by definition a variable-length array type. If any dimension but the first of an array type is not given by an integer constant expressions then you cannot express the corresponding pointer type without using a VLA.

Therefore, if you want a contiguously allocated multidimensional array (array of arrays) for which any dimension other than the first is chosen at runtime, then a VLA type must be involved. Allocating such an object dynamically works great and has little or no downside other than lack of support by certain compilers (which is a non-negligible consideration, to be sure). It would look something like this:

void do_something(size_t rows, size_t columns) {
    int (*my_array)[columns];  // pointer to VLA

    my_array = malloc(rows * sizeof(*my_array));

    // ... access elements as my_array[row][col] ...
}

You should have seen similar in some of the Q&As you reference in the question.

If that's not acceptable, then you need to choose which of your other requirements to give up. I would suggest the "multi-dimensional" part. Instead, allocate (effectively) a one-dimensional array, and use it as if it had two dimensions by performing appropriate index computations upon access. This should perform almost as well, because it's pretty close to what the compiler will set up automatically for a multidimensional array. You can make it a bit easier on yourself by creating a macro to assist with the computations. For example,

#define ELEMENT_2D(a, dim2, row, col) ((a)[(row) * (dim2) + (col)])

void do_something(size_t rows, size_t columns) {
    int *my_array;

    my_array = malloc(rows * columns * sizeof(*my_array));

    // ... access elements as ELEMENT_2D(my_array, columns, row, col) ..
}

Alternatively, you could give up the contiguous allocation and go with an array of pointers instead. This is what people who don't understand arrays, pointers, and / or dynamic allocation typically do, and although there are some applications, especially for arrays of pointers to strings, this form has mostly downside relative to contiguous allocation for the kinds of applications where one wants an object they think of as a 2D array.

score 2 · Accepted Answer · answered Nov 26 '22 at 18:17

Often an array of pointers is allocated and then memory is allocated to each pointer.
This could be inverted. Allocate a large contiguous block of memory. Allocate an array of pointers and assign addresses from within the contiguous block.

#include <stdio.h>
#include <stdlib.h>

int **contiguous ( int rows, int cols, int **memory, int **pointers) {
    int *temp = NULL;
    int **ptrtemp = NULL;
    // allocate a large block of memory
    if ( NULL == ( temp = realloc ( *memory, sizeof **memory * rows * cols))) {
        fprintf ( stderr, "problem memory malloc\n");
        return pointers;
    }

    *memory = temp;
    // allocate pointers
    if ( NULL == ( ptrtemp = realloc ( pointers, sizeof *pointers * rows))) {
        fprintf ( stderr, "problem memory malloc\n");
        return pointers;
    }

    pointers = ptrtemp;

    for ( int rw = 0; rw < rows; ++rw) {
        pointers[rw] = &(*memory)[rw * cols]; // assign addresses to pointers
    }

    // assign some values
    for ( int rw = 0; rw < rows; ++rw) {
        for ( int cl = 0; cl < cols; ++cl) {
            pointers[rw][cl] = rw * cols + cl;
        }
    }
    return pointers;
}

int main ( void) {
    int *memory = NULL;
    int **ptrs = NULL;
    int rows = 20;
    int cols = 17;

    if ( ( ptrs = contiguous ( rows, cols, &memory, ptrs))) {
        for ( int rw = 0; rw < rows; ++rw) {
            for ( int cl = 0; cl < cols; ++cl) {
                printf ( "%3d ", ptrs[rw][cl]);
            }
            printf ( "\n");
        }

        free ( memory);
        free ( ptrs);
    }
    return 0;
}

I will accept this answer because it is the one I wanted. Sadly, it appears that the version with one block of memory being accessed using math in the index area is superior. I posted my code here: https://www.reddit.com/r/C_Programming/comments/z62chp/benchmark_for_dynamically_allocating_2d_array/ — iloveclang, Nov 27 '22 at 14:26
I don't think mine is good enough to be accepted, but I'll post it. Maybe someone comments on it and says how it's bad. — iloveclang, Nov 27 '22 at 18:52

score 1 · Answer 4 · edited Nov 26 '22 at 19:15

1

Suppose you need a 2D array of size W x H containing ints (where H is the number of rows, and W the number of columns).
Then you can do the the following:

Allocation:

int * a = malloc(W * H * sizeof(int));

Access element at location (i,j):

int val = a[j * W + i];
a[j * W + i] = val;

The whole array would occupy a continuous block of memory, and can be dynamically allocated (without VLAs). Being a continuous block of memory offers an advantage over an array of pointers due to [potentially] fewer cache misses.

In such an array, the term "stride" refers to the offset between one row to another. If you need to use padding e.g. to make sure all lines start at some aligned address, you can use a stride which is bigger than W.

edited Nov 26 '22 at 19:15

Jonathan Leffler

730,956
141
904
1,278

answered Nov 26 '22 at 17:27

wohlstad

12,661
10
26
39

but in this scenario, aren't you just replacing the slowdown of non-contignuous memory allocation of the pointer-to-pointer method, for a probably even larger slowdown because you now have to do math to access every single member? – iloveclang Nov 26 '22 at 19:32
1

This is something that needs to be measured in your specific case. From my experience, in the cases I had, the overhead of the multiplication and addition was negligible. – wohlstad Nov 26 '22 at 19:50

score 1 · Answer 5 · answered Nov 27 '22 at 18:59

I did a benchmark between:

the classic pointer to array of pointers to individually malloc'd memory
one pointer to contignuous memory, accessed with a[x * COLS + y]
a mix of both - pointer to array of pointers to sliced up malloc'd contignuous memory

TL;DR:
the second one appears to be faster by 2-12% compared to the others, which are sort of similar in performance.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define ROWS    100
#define COLS    100
#define LOOPS   100
#define NORMAL  0
#define SINGLE  1
#define HYBRID  2

int **x_normal;                                             /* global vars to make it more equal */
int *y_single;
int *z_hybrid_memory;
int **z_hybrid_pointers;
int copy_array[ROWS][COLS];

void x_normal_write(int magic) {                            /* magic number to prevent compiler from optimizing it */
    int i, ii;
    for (i = 0; i < ROWS; i++) {
        for (ii = 0; ii < COLS; ii++) {
            x_normal[i][ii] = (i * COLS + ii + magic);
        }
    }
}

void y_single_write(int magic) {
    int i, ii;
    for (i = 0; i < ROWS; i++) {
        for (ii = 0; ii < COLS; ii++) {
            y_single[i * COLS + ii] = (i * COLS + ii + magic);
        }
    }
}

void z_hybrid_write(int magic) {
    int i, ii;
    for (i = 0; i < ROWS; i++) {
        for (ii = 0; ii < COLS; ii++) {
            z_hybrid_pointers[i][ii] = (i * COLS + ii + magic);
        }
    }
}

void x_normal_copy(void) {
    int i, ii;
    for (i = 0; i < ROWS; i++) {
        for (ii = 0; ii < COLS; ii++) {
            copy_array[i][ii] = x_normal[i][ii];
        }
    }
}

void y_single_copy(void) {
    int i, ii;
    for (i = 0; i < ROWS; i++) {
        for (ii = 0; ii < COLS; ii++) {
            copy_array[i][ii] = y_single[i * COLS + ii];
        }
    }
}

void z_hybrid_copy(void) {
    int i, ii;
    for (i = 0; i < ROWS; i++) {
        for (ii = 0; ii < COLS; ii++) {
            copy_array[i][ii] = z_hybrid_pointers[i][ii];
        }
    }
}

int main() {
    int i;
    clock_t start, end;
    double times_read[3][LOOPS];
    double times_write[3][LOOPS];

    /* MALLOC X_NORMAL 1/2 */
    x_normal = malloc(ROWS * sizeof(int*));                 /* rows */
    for (i = 0; i < ROWS; i+=2) {                           /* malloc every other row to ensure memory isn't contignuous */
        x_normal[i] = malloc(COLS * sizeof(int));           /* columns for each row (1/2) */
    }
    
    /* MALLOC Y_SINGLE */
    y_single = malloc(ROWS * COLS * sizeof(int));           /* all in one contignuous memory */

    /* MALLOC Z_HYBRID */
    z_hybrid_memory = malloc(ROWS * COLS * sizeof(int));    /* memory part - with a big chunk of contignuous memory */
    z_hybrid_pointers = malloc(ROWS * sizeof(int*));        /* pointer part - like in normal */
    for (i = 0; i < ROWS; i++) {                            /* assign addresses to pointers from "memory", spaced out by COLS */
        z_hybrid_pointers[i] = &z_hybrid_memory[(i * COLS)]; 
    }

    /* MALLOC X_NORMAL 2/2 */
    for (i = 1; i < ROWS; i+=2) {                           /* malloc every other row to ensure memory isn't contignuous */
        x_normal[i] = malloc(COLS * sizeof(int));           /* columns for each row (2/2) */
    }

    /* TEST */
    for (i = 0; i < LOOPS; i++) {
        /* NORMAL WRITE */
        start = clock();
        x_normal_write(i);
        end = clock();
        times_write[NORMAL][i] = (double)(end - start);

        /* SINGLE WRITE */
        start = clock();
        y_single_write(i);
        end = clock();
        times_write[SINGLE][i] = (double)(end - start);

        /* HYBRID WRITE */
        start = clock();
        z_hybrid_write(i);
        end = clock();
        times_write[HYBRID][i] = (double)(end - start);

        /* NORMAL READ */
        start = clock();
        x_normal_copy();
        end = clock();
        times_read[NORMAL][i] = (double)(end - start);

        /* SINGLE READ */
        start = clock();
        y_single_copy();
        end = clock();
        times_read[SINGLE][i] = (double)(end - start);

        /* HYBRID READ */
        start = clock();
        z_hybrid_copy();
        end = clock();
        times_read[HYBRID][i] = (double)(end - start);
    }

    /* REPORT FINDINGS */
    printf("CLOCKS NEEDED FOR:\n\nREAD\tNORMAL\tSINGLE\tHYBRID\tWRITE\tNORMAL\tSINGLE\tHYBRID\n\n");
    for (i = 0; i < LOOPS; i++) {
        printf(
            "\t%.1f\t%.1f\t%.1f\t\t%.1f\t%.1f\t%.1f\n", 
            times_read[NORMAL][i], times_read[SINGLE][i], times_read[HYBRID][i],
            times_write[NORMAL][i], times_write[SINGLE][i], times_write[HYBRID][i]
        );
        /* USE [0] to get totals */
        times_read[NORMAL][0] += times_read[NORMAL][i];
        times_read[SINGLE][0] += times_read[SINGLE][i];
        times_read[HYBRID][0] += times_read[HYBRID][i];
        times_write[NORMAL][0] += times_write[NORMAL][i];
        times_write[SINGLE][0] += times_write[SINGLE][i];
        times_write[HYBRID][0] += times_write[HYBRID][i];
    }
    printf("TOTAL:\n\t%.1f\t%.1f\t%.1f\t\t%.1f\t%.1f\t%.1f\n",
        times_read[NORMAL][0], times_read[SINGLE][0], times_read[HYBRID][0],
        times_write[NORMAL][0], times_write[SINGLE][0], times_write[HYBRID][0]
    );
    printf("AVERAGE:\n\t%.1f\t%.1f\t%.1f\t\t%.1f\t%.1f\t%.1f\n",
        (times_read[NORMAL][0] / LOOPS), (times_read[SINGLE][0] / LOOPS), (times_read[HYBRID][0] / LOOPS),
        (times_write[NORMAL][0] / LOOPS), (times_write[SINGLE][0] / LOOPS), (times_write[HYBRID][0] / LOOPS)
    );

    return 0;
}

Though maybe this is not the best approach since the result can be tainted by random stuff - such as perhaps the proximity of the source arrays to the destination array for copy functions (though the numbers are consistent for reads and writes. Perhaps someone can expand on this.)

Dynamically allocate contiguous memory for a "rectangular 2d array", without using VLAs

5 Answers5