0

I have two programs, compiled with g++ and executed on linux. Program 1 creates a 2D array and then measures how long it takes to access all of its elements 100000 times:

#include <time.h>
#include <iostream>

int main()
{
  clock_t time;
  int i, y, x;

  int matrix[9][9]{{ 0,  1,  2,  3,  4,  5,  6,  7,  8},
                   { 9, 10, 11, 12, 13, 14, 15, 16, 17},
                   {18, 19, 20, 21, 22, 23, 24, 25, 26},
                   {27, 28, 29, 30, 31, 32, 33, 34, 35},
                   {36, 37, 38, 39, 40, 41, 42, 43, 44},
                   {45, 46, 47, 48, 49, 50, 51, 52, 53},
                   {54, 55, 56, 57, 58, 59, 60, 61, 62},
                   {63, 64, 65, 66, 67, 68, 69, 70, 71},
                   {72, 73, 74, 75, 76, 77, 78, 79, 80}};

  time = clock();

  for (i = 0; i < 100000; i++)
  {
    for (x = 0; x < 9; x++)
    {
      for (y = 0; y < 9; y++)
      {
        matrix[x][y];
      }
    }
  }

  time = clock() - time;
  std::cout << "Clicks:     " << time << std::endl;
  std::cout << "Time taken: " << (double) time / CLOCKS_PER_SEC << "s" << std::endl;
}

Program 2 creates a 1D array and also measures how long it takes to access all of its elements 100000 times:

#include <time.h>
#include <iostream>

int main()
{
  clock_t time;
  int i, j;

  int vector[81] = { 0,  1,  2,  3,  4,  5,  6,  7,  8,
                     9, 10, 11, 12, 13, 14, 15, 16, 17,
                    18, 19, 20, 21, 22, 23, 24, 25, 26,
                    27, 28, 29, 30, 31, 32, 33, 34, 35,
                    36, 37, 38, 39, 40, 41, 42, 43, 44,
                    45, 46, 47, 48, 49, 50, 51, 52, 53,
                    54, 55, 56, 57, 58, 59, 60, 61, 62,
                    63, 64, 65, 66, 67, 68, 69, 70, 71,
                    72, 73, 74, 75, 76, 77, 78, 79, 80};

  time = clock();

  for (i = 0; i < 100000; i++)
  {
    for (j = 0; j < 81; j++)
    {
      vector[j];
    }
  }

  time = clock() - time;
  std::cout << "Clicks:     " << time << std::endl;
  std::cout << "Time taken: " << (double) time / CLOCKS_PER_SEC << "s" << std::endl;
}

After executing program 1 my output is:

Clicks:     8106
Time taken: 0.008106s

After executing program 2 my output is:

Clicks:     15958
Time taken: 0.015958s

It is my understanding that a 1D array is stored in a continuous block of memory. Likewise the rows of a static 2D array are stored in contiguous blocks of memory. Conversely the rows of a dynamic 2d array might not be stored in contiguous blocks of memory. If this is true then program 2 should be at least similar in speed to program 1 therefore my question is why would program 1 be remarkably faster than program 2?

  • Correct me if I'm wrong, but doesn't c treat 2d and 1d arrays pretty much the same? – KyleKW Nov 22 '16 at 17:17
  • 1
    What compiler and OS did you test it on? – R Sahu Nov 22 '16 at 17:22
  • @KyleKW I believe so but the indexing would be different i.e in the 2D array you access each element with an index = width * column index + row index. My concern here is why would the 2D array be faster to access than the 1D array. – JeliBeanMachine Nov 22 '16 at 17:23
  • @RSahu g++ and linux – JeliBeanMachine Nov 22 '16 at 17:25
  • This may be due to the assembly code generated differently, thus depending on your compiler. Also be careful as some optimization option might omit some part of your code as it is found useless (like just writing `matrix[x][y];`). Another explanation could be that it is related to your hardware, as memory addresses appears to be calculated differently, it could involve some hidden processor cache mechanisms. Although, these are just wild guesses. – Wexiwa Nov 22 '16 at 17:28
  • 1
    Following what I was saying, you may want to look at this [answer](http://stackoverflow.com/a/1951271/6704294) (2nd and 3rd paragraph are talking about caching and [prefetching](https://en.wikipedia.org/wiki/Instruction_prefetch) ) – Wexiwa Nov 22 '16 at 17:44
  • It's not a fair test, because in both cases, you aren't doing anything with the numbers. A good optimizer will discard everything between `clock()` calls. – Mark Lakata Nov 22 '16 at 18:00
  • That said, the first 2D array is "ragged" or "jagged" meaning that each row of the matrix is not necessary directly adjacent to the next row, while the 1D array, everything is adjacent. It could be that the 2D array has gaps that aid in aligning the data, and that could aid in SIMD vectorization (i.e. the `int` data is probably 32 bits, but your processor can load 64,128, 256 or 512 bits at a time). If you align the `int` data properly, you could load the data more efficiently. – Mark Lakata Nov 22 '16 at 18:03
  • Possible duplicate of [1D or 2D array, what's faster?](http://stackoverflow.com/questions/17259877/1d-or-2d-array-whats-faster) – Jon Flow Nov 22 '16 at 18:21
  • 1
    Trying to measure performance of such short period will often be biased due to noise caused by other processes - scheduler manages context switches on a level of 10-20 ms so you need your test to be atleast a magnitude slower for it to have no effect. After increasing number of iterations to `10000000` I got reverse results: `4.69s` for 2D array and `3.44s` for 1D array, in both cases error in measurement is less than 1%. – thorhunter Nov 22 '16 at 18:23
  • I can get (close to) `Time taken: 0.015958s` only in Debug mode. In Release mode I get `Time taken: 0s`. – Bo Persson Nov 22 '16 at 18:28

2 Answers2

2

Here is what I found:

  1. If you actually make use of the value then the run time is almost the same, for example, change matrix[x][y]; to matrix[x][y] += 1; and vector[j]; to vector[j] += 1;

    >     Clicks:     28519
    >     Time taken: 0.028519s
    

    and

    >     Clicks:     29941
    >     Time taken: 0.029941s
    
  2. Without the above changes, optimize while compiling, g++ -O3 <filename>.cpp, this results in same time, got the same following output for both programs:

    $./a.out

    >     Clicks:     2
    >     Time taken: 2e-06s
    

So, what you are pointing out is compiler optimizations.

Enigma
  • 329
  • 1
  • 10
  • Not sure why the downvote for you, but I countered it for now. I would lead off your answer with your conclusion though, because these questions come up so often and 99% of the time, it is because the are coming to conclusions on way to limited a set of data. – Michael Dorgan Nov 22 '16 at 19:14
  • @michaelDorgan Thank you :) – Enigma Nov 22 '16 at 21:08
1

The loops are likely to be removed (kind of optimising) by compiler, because

  1. You actually did nothing in loops.

  2. The matrix can be treated as a const array.

  3. program 1 is faster than program 2. ( :< )

To see whether the deletion happens to your code during compiling, you can increase the most outer loop by 100 times, and see whether the time needed for execution is increased significantly (not necessarily by exact 100 times).

If true, you can prevent this kind of optimising by doing some actual works in loop (calculate the sum, and don't forget printing it afterwards) and introduce some "unpredictable" changes to your matrix, for example:

srand(10);
for (int i=0; i<9; ++i) {
  matrix[i][i] = rand()%100;
}

And further, compiler may conduct some other optimising to your code, for example, expand your loops, even the address of the element you are visiting (they are no longer calculated at run time), you can prevent this by making the executing times of loops "unpredictable" :

#include <chrono>
#include <iostream>
#include <cstdlib>

int array[100];
int array2[10][10];

int64_t Sum1D(int len) {
  int64_t sum = 0;
  for (int i=0; i<100000; ++i) {
    for (int j=0; j<len; ++j) {
        sum += array[j];
    }
  }
  return sum;
}

int64_t Sum2D(int len1, int len2) {
  int64_t sum = 0;
  for (int i=0; i<100000; ++i) {
    for (int j=0; j<len1; ++j) {
      for (int k=0; k<len2; ++k)
        sum += array2[j][k];
    }
  }
  return sum;
}

int main()
{
  for (int i=0; i<100; ++i) {
    array[i] = rand();
    array2[i%10][i/10] = rand();
  }

  auto time = std::chrono::steady_clock::now();

  //int64_t sum = Sum1D(100);
  int64_t sum = Sum2D(10,10);

  auto duration = std::chrono::steady_clock::now()-time;
  std::cout << sum << "!" << duration.count() << std::endl;

  return 0;
} 

which finally makes program1 slower than program2. ( :> )

felix
  • 2,213
  • 7
  • 16