iterating through matrix is slower when changing A[i][j] to A[j][i]

Question

I have a matrix of ints named A, and when I'm iterating trough it by columns instead of rows, it runs about 50 ms slower:

for(int i=0;i<n;i++)  
    for(int j=0;j<n;j++)  
        cout<<A[j][i];    //slower than of A[i][j]

Does anyone know why does this happen? I've asked a couple of people, but none of them knew why. I'm sure it's related to how the addresses are represented in computer's memory, but still, I'd like to find a more concrete answer.

can you post the code where you initialize the array? – Thomas Mondel Feb 25 '13 at 08:07 — Thomas Mondel, Feb 25 '13 at 08:07
it's a global array, it's just `int A[100][100]` – Mihai Bujanca Feb 25 '13 at 08:19 — Mihai Bujanca, Feb 25 '13 at 08:19

score 8 · Accepted Answer · edited May 23 '17 at 12:18

Iterating through the matrix row by row is faster because of cache memory.

When you access A[i][j], there is more memory being loaded into the cache than just one element. Note that each row of your matrix is stored within continuous memory block, thus when the memory "around" A[i][j] is still in cache, it is more probable that accessing next element within the same row will result in it being read from cache rather than main memory (see cache miss).

Also see related questions:
Why does the order of the loops affect performance when iterating over a 2D array?
Which of these two for loops is more efficient in terms of time and cache performance
How cache memory works?
Matrix multiplication: Small difference in matrix size, large difference in timings

score 3 · Answer 2 · answered Feb 25 '13 at 08:07

A 2D array is stored in memory as a 1D array, in (row/column) major. What this means is that an array with 5 columns, might be stored as 5 columns one after the other, so based on how you access vs this ordering, your accesses might be cached, or every one of them might cause cache fails, causing the large difference in performance.

anishsane · Answer 3 · 2013-02-25T08:13:27.653

3

It's about cache line read mechanism. Read about spatial locality.

To verify, try to disable cache while running this application. (I forgot how to do this, but it can be done.)

edited Feb 25 '13 at 08:13

answered Feb 25 '13 at 08:08

anishsane

20,270
5
40
73

score 3 · Answer 4 · answered Feb 25 '13 at 08:40

AS others have noted, it is a cache issue. Using it one way may cause a cache miss each time an array element is accessed.

The cache issue is actually a very important factor for optimizations. It's the reason why it is sometimes better to do a structure of arrays instead of array of structures. Compare these two:

struct StructOfArrays {
  int values[100];
  char names[100][100];
}

struct StructOfArrays values;

struct NormalValStruct {
  int val;
  char name[100];
}

struct NormalValStruct values[100];

If you iterate over values in StructOfArrays they will be probably loaded into cache and read efficiently. When you iterate over NormalValStruct and get the value member, you will get a cache miss every other time.

That trick is often used in high-performance applications. Which are, often, games.

score 2 · Answer 5 · answered Feb 25 '13 at 08:06

2

Because the first loop accesses memory linear the other with gaps in between. Thus the first loop is friendlier for the cache.

answered Feb 25 '13 at 08:06

Udo Klein

6,784
1
36
61

iterating through matrix is slower when changing A[i][j] to A[j][i]

5 Answers5