I wonder if anyone could advise on storage of large (say 2000 x 2000 x 2000) 3D arrays for finite difference discretization computations. Does contiguous storage float*
give better performance then float***
on modern CPU architectures?
Here is a simplified example of computations, which are done over entire arrays:
for i ...
for j ...
for k ...
u[i][j][k] += v[i][j][k+1] + v[i][j][k-1]
+ v[i][j+1][k] + v[i][j-1][k] + v[i+1][j][k] + v[i-1][j][k];
Vs
u[i * iStride + j * jStride + k] += ...
PS:
Considering size of problems, storing T***
is a very small overhead. Access is not random. Moreover, I do loop blocking to minimize cache misses. I am just wondering how triple dereferencing in T***
case compares to index computation and single dereferencing in case of 1D array.