performance of fortran matrix operations

Question

I need to use Fortran instead of C somewhere and I am very new to Fortran. I am trying to do some big calculations but it is quite slow comparing to C (maybe 10x or more and I am using Intel's compilers for both). I think the reason is Fortran keeps the matrix in column major format, and I am trying to do operations like sum(matrix(i, j, :)), because it is column major, probably this uses the cache very inefficiently (probably not using at all). However, I am not sure if this is the actual reason (since I know so less about Fortran). Question is, the convention in Fortran is to do operations on column vectors instead of row vectors ?

(BTW: I checked the speed of Fortran already using Intel's LAPACK libraries, and it is quite fast, so it is not related to any compiler or build issue.)

Thanks.

Mete

FORTRAN is typically faster than C at matrix operations, so I would guess the problem is in your code. — , May 17 '11 at 11:41

Paul R · Accepted Answer · 2011-05-18T06:52:43.887

4

Try changing the order of your loops when doing matrix operations, e.g. if you have something like this in C:

for (i = 0; i < M; ++i) // for each row
{
    for (j = 0; j < N; ++j) // for each col
    {
        // matrix operations on e.g. A[i][j]
    }
}

then in Fortran you want the j (column) loop as the outer loop and the i (row) loop as the inner loop.

An alternative approach, which achieves the same thing, is to keep the loops as they are but change the definition of the array, e.g. if in C it's A[x][y][z][t] then in FORTRAN make it A[t][z][y][x], assuming that t is the fastest varying loop index, and x the slowest.

edited May 18 '11 at 06:52

answered May 17 '11 at 11:42

Paul R

208,748
37
389
560

I said matrix but it is actually 4D; x,y,z,t. I need t vector for each spatial location. so I think I cannot easily change the loop (I dont think I can make calculation that way). Is there an easy way to transform it to t,x,y,z format (I mean a Fortran specific easier way). Sorry if I am talking nonsense, really I am not used to thinking Fortran way. – mete May 17 '11 at 19:12

score 4 · Answer 2 · answered May 17 '11 at 15:33

Since, as you write, Fortran is column major with the first index varying fastest in memory layout, so sum(matrix(i, j, :)) causes the summation of non-contiguous locations. If this is really the cause of slower operation, then you could redefine your matrix to have a different order of dimensions so that the current 3rd dimension is the 1st. Yes, if this is your main computation, rearrange the matrix to make the summation a column operation. Explicit looping should be as earlier indices fastest, as described by @PaulR. If you had previously thought of the optimum index order for C and are changing to Fortran, this is one aspect that might need changing. But while this is theoretically true, I doubt that it really matters that much in practice, unless perhaps the array is enormous. (The worse case would be that part of the array is in RAM and part in swap on disk!) The first rule about run-time speed issues is don't guess ... measure. It is usually the algorithm.

I dont know what enermous mean but surely it doesnt fit to cache easily (apprx 2^24 doubles) I experienced same before in C also. Algorithm is simple, in spatio-temporal data I am calculating pearson correlation coeff for each pair of data (samples in each spatial location). — mete, May 17 '11 at 19:19
It's not about fitting the whole data structure into cache, it's about ensuring that once you pull in a cache line from main memory, you can use many of the values in quick succession. — Jonathan Dursi, May 17 '11 at 21:11

performance of fortran matrix operations

2 Answers2

Linked