0

which way is faster and compiler/cache friendlier, use M[a][b] or M[a*b] when working with matrices?

I tried writing both ways on compiler explorer in a function that allocates, initialises and returns a matrix but I don't know assembly and how much time each instruction takes

int **M = malloc(sizeof(int*)*m)
for(i=0; i<m; ++i) {
  *M = malloc(sizeof(int)*n);
  for(int j = 0; j < n; ++j){
    M[j] = j;
  }

vs

int *M = malloc(m*n*sizeof(int));
for(i = 0; i < m*n; ++i) M[i] = i;

I expect the second way to be faster.

ninazzo
  • 547
  • 1
  • 6
  • 17
  • the second way doesn't initialize a 2D matrix, it is just an array of length `m*n` – FirePapaya May 31 '19 at 04:31
  • 2
    @TimBeam neither does the first one... it is a jagged array – Antti Haapala -- Слава Україні May 31 '19 at 04:45
  • 1
    https://stackoverflow.com/questions/8740195/how-do-we-allocate-a-2-d-array-using-one-malloc-statement – Antti Haapala -- Слава Україні May 31 '19 at 04:49
  • 1
    The first fragment leaks memory. The line `*M = malloc(…);` should be `M[i] = malloc(…);` to make much sense. You then need to use `M[i][j] = j;` in the inner loop. Even with those fixed, the values in the matrices would be different. With the second matrix, you'd calculate `M[row * num_cols + col]` to access the element at row `row` and column `col`; with the first matrix, you'd just write `M[row][col]`. Note, though, that the first matrix would require two memory accesses (one to get the value in `M[row]`, another to get the value in `M[row][col]`, whereas the second requires just one. – Jonathan Leffler May 31 '19 at 05:53

3 Answers3

1

The code with with the malloc calls will be slower. More interesting how fast access to the particular cell is

void foo(int * const * const M, const size_t x, const size_t y, const int val)
{
    M[x][y] = val;
}

void foo2(int * const M, const size_t x, const size_t y, const size_t rowsize, const int val)
{
    M[x + rowsize * y] = val;
}

https://godbolt.org/z/iv0VPV

foo:
        mov     rax, QWORD PTR [rdi+rsi*8]
        mov     DWORD PTR [rax+rdx*4], ecx
        ret
foo2:
        imul    rcx, rdx
        add     rcx, rsi
        mov     DWORD PTR [rdi+rcx*4], r8d
        ret

the result is obvious;

0___________
  • 60,014
  • 4
  • 34
  • 74
  • I'm not good at Assembly. You mean the second one is faster because `imul` is faster than memory access? – apadana Jul 31 '21 at 15:35
0

If your problem (and the respective solution) require a 2D array, then just use a 2D array: M[a][b].

You must have in mind that the memory is addressed linearly anyway. The concepts of multi-dimensional arrays are just a layer implemented on top of the linear memory.

Nowadays, the compilers are quite highly optimized, so they will do a better job of "linearizeing" a 2D array, than you can. Additionally, if you do it, the code will be a lot more difficult to write and maintain.

virolino
  • 2,073
  • 5
  • 21
-1

You can use clock_t to track the time of block of code.

and here's your code after some updates.


#include<stdio.h>
#include<time.h>

int main()
{
    int i = 0, j = 0;
    int m, n;
    scanf("%d %d", &m, &n);
    clock_t start, end;
    double time_used;
    start = clock();

    int **M = malloc(sizeof(int*)*m);

    for (i = 0; i < m; ++i) {
        *M = malloc(sizeof(int)*n);
        for (int j = 0; j < n; ++j) {
            M[j] = j;
        }
    }
        end = clock();
        time_used = ((double)(end - start)) / CLOCKS_PER_SEC;

        printf("Time used for fisrst code is : %f \n ", time_used);
        start = clock();
        M = malloc(m*n * sizeof(int));
        for (i = 0; i < m*n; ++i) M[i] = i;
        end = clock();
        time_used = ((double)(end - start)) / CLOCKS_PER_SEC;
        printf("Time used for second code is : %f \n ", time_used);


        return 0;
}

and the output for this code is when entering 10000*10000 matrix


Time used for first code is : 0.001000


Time used for first code is : 0.686000


This means that the second code takes more time than first one. enter image description here

Fouad
  • 39
  • 3
  • are you sure you compile it with optimizations enabled? And in Windows QueryPerformanceCounter should be used instead of clock_t – phuclv May 31 '19 at 06:47
  • this output is after running on full optimization mode first code : 0.00000 second code : 0.22500 – Fouad May 31 '19 at 07:33
  • 1
    The code is totally wrong so the result is also wrong. If the code is correctly indented it becomes obviously that the `}` are placed incorrectly. As the code is now, the first measurement only use `for (int j = 0; j < n; ++j) {` while the second use `for (i = 0; i < m*n; ++i)` In other words - the second measurement has m times more iterations! Consequently all numbers here are wrong. – Support Ukraine May 31 '19 at 07:49
  • I didn't update anything in code structure , i just surround his code by clock to calculate the time and it gives me this result – Fouad May 31 '19 at 08:02
  • @Fouad You inserted an extra `}` that are missing in OPs code. But you placed it incorrectly. – Support Ukraine May 31 '19 at 08:51
  • Ops, you are right , i updated the code and the output is that first code : 0.329000 second code : 0.279000 seconds – Fouad May 31 '19 at 09:14
  • @Fouad I can see the code has been updated but you have not updated the measurement results - please do that as well – Support Ukraine May 31 '19 at 09:42