How does C allocate space for a 2D (3D...) array when using malloc?

Question

I am having a problem understanding how C allocates space for a 2D (or more dimensional) array, especially when we use malloc and the like. Take the program in this question for example.

A one-dimensional array of pointers is first defined, then pointers to arrays of 1D data (in this case strings) are put in each one of boxes of the first 1D array. So there is no guarantee that the whole 2D array is contiguous (the last cell of the previous row is followed by the first cell of the next row). Each 1D array of data can be very distant, only their pointers are contiguous. Am I correct or am I missing something? I would really appreciate if you could help me clarify this.

Another way to allocate 2D arrays is with a single buffer and then indexing it based on 2D coordinates. 8 * 8 = 64. Allocate a single 64 byte buffer and index = x + y * 8 — Justin Meiners, Jul 11 '13 at 02:52
So defining a 2D array like the questions above is indeed not contiguous. Thanks. I understand how to associate 8x8=64 spaces, could you give me an example how how I can index it based on 2D coordinates? Thank you. — makhlaghi, Jul 11 '13 at 02:56

score 8 · Accepted Answer · answered Jul 11 '13 at 03:12

There are various ways of doing it, depending on how you're going to access it. You can ensure that the main body of the array is contiguous, or you can avoid that. For arrays of strings, you often don't bother with making the body of the array contiguous. For 2D (etc) arrays of integers or doubles, you usually do make the body of the array contiguous.

In the examples, the data type of the array is the generic type T, assumed to be numeric so the array elements can be assigned 0. The examples do not error check the memory allocations; they should in production code.

Array access with computed indexes — contiguous array body

int n1 = 5;
int n2 = 6;

T *a = malloc(n1 * n2 * sizeof(T));

for (int i = 0; i < n1; i++)
    for (int j = 0; j < n2; j++)
        a[i * n2 + j] = 0;

free(a);

Array access with double subscripts — contiguous array body

int n1 = 5;
int n2 = 6;

T **a = malloc(n1 * sizeof(T*));
T  *b = malloc(n1 * n2 * sizeof(T));

for (int i = 0; i < n1; i++)
    a[i] = &b[i * n2];

for (int i = 0; i < n1; i++)
    for (int j = 0; j < n2; j++)
        a[i][j] = 0;

free(b);
free(a);

Array access with double subscripts — discontiguous array body

int n1 = 5;
int n2 = 6;

T **a = malloc(n1 * sizeof(T*));

for (int i = 0; i < n1; i++)
    a[i] = malloc(n2 * sizeof(T));

for (int i = 0; i < n1; i++)
    for (int j = 0; j < n2; j++)
        a[i][j] = 0;

for (int i = 0; i < n1; i++)
    free(a[i]);
free(a);

Thank you very much for the complete examples. I totally understand now. — makhlaghi, Jul 11 '13 at 03:22
Just one question: The "Array access with computed indexes — contiguous array body" seems to be simpler (thus faster) than the "Array access with double subscripts — contiguous array body", am I correct? — makhlaghi, Jul 11 '13 at 03:38
It is a question of speed of multiplication and addition versus memory accesses. These days, the chances are that the computation is quicker than memory access, but the notational convenience of the double subscript is not negligible. It's a swings and roundabouts game. Choose what works for you; measure the performance if it matters. — Jonathan Leffler, Jul 11 '13 at 03:49

Justin Meiners · Answer 2 · 2013-07-11T03:08:31.023

Method 1 (pointers to buffers, non contiguous)

You are correct, there is no guarantee the data will contiguous and in fact it most likely will not be. The top level array (rows) is simply a 1D array of pointers (each element is it's own pointer). These pointers each point to their own 1D array of actual objects. These buffers are only connected through pointers.

linked

/* allocation */
int** array = malloc(sizeof(int*) * height)
for (int y = 0; y < height; y ++)
{
   array[i] = malloc(sizeof(int) * width);
}
/* indexing */
int item = array[y][x];

Method 2 (single buffer, contiguous)

Another way to allocate 2D arrays is with a single buffer and then indexing it based on 2D coordinates. e.g. 8 * 8 = 64. Allocate a single 64 byte buffer and index = x + y * 8. This method stores data contiguously and it is much easier to allocate and deallocate than method 1.

contiguous

/* allocation */
int* array = malloc(sizeof(int) * width * height)
/* indexing */
int item = array[x + y * width];

I was in the mindset that a contiguous array is easier (thus faster) for the computer to analyze, is this understanding correct? I understand that for a small data set it doesn't make much difference, but I assume it will for a large one (a large image for example). Am I correct? — makhlaghi, Jul 11 '13 at 03:13
@astroboy contiguous data layouts are more cache friendly for iteration and random access with small to large datasets. I would imagine it is noticeably faster for anything but the smallest datasets. It is easier to malloc and free as well. — Justin Meiners, Jul 11 '13 at 03:15

score 2 · Answer 3 · answered Jul 11 '13 at 03:00

2

I think you are right. But if you really want the array to be contiguous, you can malloc an 1D-array and use it like a 2D one, like

int* oneDArray = (int*)malloc(sizeof(int)*10*10);
int a = oneDArray[i*10+j];     //which equals to twoDArray[i][j]

answered Jul 11 '13 at 03:00

Manas

598
3
14