5

I have a matrix of ints named A, and when I'm iterating trough it by columns instead of rows, it runs about 50 ms slower:

for(int i=0;i<n;i++)  
    for(int j=0;j<n;j++)  
        cout<<A[j][i];    //slower than of A[i][j]

Does anyone know why does this happen? I've asked a couple of people, but none of them knew why. I'm sure it's related to how the addresses are represented in computer's memory, but still, I'd like to find a more concrete answer.

Mihai Bujanca
  • 4,089
  • 10
  • 43
  • 84

5 Answers5

8

Iterating through the matrix row by row is faster because of cache memory.

When you access A[i][j], there is more memory being loaded into the cache than just one element. Note that each row of your matrix is stored within continuous memory block, thus when the memory "around" A[i][j] is still in cache, it is more probable that accessing next element within the same row will result in it being read from cache rather than main memory (see cache miss).

Also see related questions:
Why does the order of the loops affect performance when iterating over a 2D array?
Which of these two for loops is more efficient in terms of time and cache performance
How cache memory works?
Matrix multiplication: Small difference in matrix size, large difference in timings

Community
  • 1
  • 1
LihO
  • 41,190
  • 11
  • 99
  • 167
3

A 2D array is stored in memory as a 1D array, in (row/column) major. What this means is that an array with 5 columns, might be stored as 5 columns one after the other, so based on how you access vs this ordering, your accesses might be cached, or every one of them might cause cache fails, causing the large difference in performance.

Karthik T
  • 31,456
  • 5
  • 68
  • 87
3

It's about cache line read mechanism. Read about spatial locality.

To verify, try to disable cache while running this application. (I forgot how to do this, but it can be done.)

anishsane
  • 20,270
  • 5
  • 40
  • 73
3

AS others have noted, it is a cache issue. Using it one way may cause a cache miss each time an array element is accessed.

The cache issue is actually a very important factor for optimizations. It's the reason why it is sometimes better to do a structure of arrays instead of array of structures. Compare these two:

struct StructOfArrays {
  int values[100];
  char names[100][100];
}

struct StructOfArrays values;

struct NormalValStruct {
  int val;
  char name[100];
}

struct NormalValStruct values[100];

If you iterate over values in StructOfArrays they will be probably loaded into cache and read efficiently. When you iterate over NormalValStruct and get the value member, you will get a cache miss every other time.

That trick is often used in high-performance applications. Which are, often, games.

Dariusz
  • 21,561
  • 9
  • 74
  • 114
2

Because the first loop accesses memory linear the other with gaps in between. Thus the first loop is friendlier for the cache.

Udo Klein
  • 6,784
  • 1
  • 36
  • 61