Which representation of a 2D matrix is faster

Question

which way is faster and compiler/cache friendlier, use M[a][b] or M[a*b] when working with matrices?

I tried writing both ways on compiler explorer in a function that allocates, initialises and returns a matrix but I don't know assembly and how much time each instruction takes

int **M = malloc(sizeof(int*)*m)
for(i=0; i<m; ++i) {
  *M = malloc(sizeof(int)*n);
  for(int j = 0; j < n; ++j){
    M[j] = j;
  }

vs

int *M = malloc(m*n*sizeof(int));
for(i = 0; i < m*n; ++i) M[i] = i;

I expect the second way to be faster.

the second way doesn't initialize a 2D matrix, it is just an array of length `m*n` — FirePapaya, May 31 '19 at 04:31
https://stackoverflow.com/questions/8740195/how-do-we-allocate-a-2-d-array-using-one-malloc-statement — Antti Haapala -- Слава Україні, May 31 '19 at 04:49
The first fragment leaks memory. The line `*M = malloc(…);` should be `M[i] = malloc(…);` to make much sense. You then need to use `M[i][j] = j;` in the inner loop. Even with those fixed, the values in the matrices would be different. With the second matrix, you'd calculate `M[row * num_cols + col]` to access the element at row `row` and column `col`; with the first matrix, you'd just write `M[row][col]`. Note, though, that the first matrix would require two memory accesses (one to get the value in `M[row]`, another to get the value in `M[row][col]`, whereas the second requires just one. — Jonathan Leffler, May 31 '19 at 05:53

score 1 · Accepted Answer · answered May 31 '19 at 09:39

The code with with the malloc calls will be slower. More interesting how fast access to the particular cell is

void foo(int * const * const M, const size_t x, const size_t y, const int val)
{
    M[x][y] = val;
}

void foo2(int * const M, const size_t x, const size_t y, const size_t rowsize, const int val)
{
    M[x + rowsize * y] = val;
}

https://godbolt.org/z/iv0VPV

foo:
        mov     rax, QWORD PTR [rdi+rsi*8]
        mov     DWORD PTR [rax+rdx*4], ecx
        ret
foo2:
        imul    rcx, rdx
        add     rcx, rsi
        mov     DWORD PTR [rdi+rcx*4], r8d
        ret

the result is obvious;

I'm not good at Assembly. You mean the second one is faster because `imul` is faster than memory access? — apadana, Jul 31 '21 at 15:35

score 0 · Answer 2 · answered May 31 '19 at 06:26

If your problem (and the respective solution) require a 2D array, then just use a 2D array: M[a][b].

You must have in mind that the memory is addressed linearly anyway. The concepts of multi-dimensional arrays are just a layer implemented on top of the linear memory.

Nowadays, the compilers are quite highly optimized, so they will do a better job of "linearizeing" a 2D array, than you can. Additionally, if you do it, the code will be a lot more difficult to write and maintain.

Fouad · Answer 3 · 2019-05-31T09:15:16.133

-1

You can use clock_t to track the time of block of code.

and here's your code after some updates.

#include<stdio.h>
#include<time.h>

int main()
{
    int i = 0, j = 0;
    int m, n;
    scanf("%d %d", &m, &n);
    clock_t start, end;
    double time_used;
    start = clock();

    int **M = malloc(sizeof(int*)*m);

    for (i = 0; i < m; ++i) {
        *M = malloc(sizeof(int)*n);
        for (int j = 0; j < n; ++j) {
            M[j] = j;
        }
    }
        end = clock();
        time_used = ((double)(end - start)) / CLOCKS_PER_SEC;

        printf("Time used for fisrst code is : %f \n ", time_used);
        start = clock();
        M = malloc(m*n * sizeof(int));
        for (i = 0; i < m*n; ++i) M[i] = i;
        end = clock();
        time_used = ((double)(end - start)) / CLOCKS_PER_SEC;
        printf("Time used for second code is : %f \n ", time_used);


        return 0;
}

and the output for this code is when entering 10000*10000 matrix

Time used for first code is : 0.001000

Time used for first code is : 0.686000

This means that the second code takes more time than first one.

edited May 31 '19 at 09:15

answered May 31 '19 at 06:36

Fouad

39
3

are you sure you compile it with optimizations enabled? And in Windows QueryPerformanceCounter should be used instead of clock_t – phuclv May 31 '19 at 06:47
this output is after running on full optimization mode first code : 0.00000 second code : 0.22500 – Fouad May 31 '19 at 07:33
1

The code is totally wrong so the result is also wrong. If the code is correctly indented it becomes obviously that the `}` are placed incorrectly. As the code is now, the first measurement only use `for (int j = 0; j < n; ++j) {` while the second use `for (i = 0; i < m*n; ++i)` In other words - the second measurement has m times more iterations! Consequently all numbers here are wrong. – Support Ukraine May 31 '19 at 07:49
I didn't update anything in code structure , i just surround his code by clock to calculate the time and it gives me this result – Fouad May 31 '19 at 08:02
@Fouad You inserted an extra `}` that are missing in OPs code. But you placed it incorrectly. – Support Ukraine May 31 '19 at 08:51
Ops, you are right , i updated the code and the output is that first code : 0.329000 second code : 0.279000 seconds – Fouad May 31 '19 at 09:14
@Fouad I can see the code has been updated but you have not updated the measurement results - please do that as well – Support Ukraine May 31 '19 at 09:42

Which representation of a 2D matrix is faster

3 Answers3