C++ Array vs Vector performance test explanation

Question

In order to quantify the difference in performance of a C-like array and Vectors in C++, I wrote this little program. https://github.com/rajatkhanduja/Benchmarks/blob/master/C%2B%2B/vectorVsArray.cpp

To compare them on common grounds, I decided to test both for random and sequential access. I added iterators, just to compare them as well (but that is not what the question focusses on).

The results, for a 64-bit Linux machine with 7.7 GB RAM with on an array/vector size of 1 million are as follows:-

Time taken to write to array. : 12.0378 ms
Time taken to read from array sequentially. : 2.48413 ms
Time taken to read from array randomly. : 37.3931 ms
Time taken to write to dynamic array. : 11.7458 ms
Time taken to read from dynamic array sequentially. : 2.85107 ms
Time taken to read from dynamic array randomly. : 36.0579 ms
Time taken to write to vector using indices. : 11.3909 ms
Time taken to read from vector using indices, sequentially. : 4.09106 ms
Time taken to read from vector using indices, randomly. : 39 ms
Time taken to write to vector using iterators. : 24.9949 ms
Time taken to read from vector using iterators. : 18.8049 ms

The vector's size is set at the time of initialization and not altered, so there is no resizing of the vector (the assertion in the program helps verify that). The times don't include the initialization times of any of the statically allocated array, dynamically allocated array or the vector.

According to the statistics, the time to write to Vector is lesser than that of array but the time to read from vector is twice as much as that of array.

The difference is pretty small, but is there an explanation as to why there is a performance difference ? Is there something wrong with the test ? I expected both to perform at the same speed. The repetition of this test shows the same trend.

The code:

#include <vector>
#include <iostream>
#include <cstdlib>
#include <ctime>
#include <sys/time.h>
#include <cassert>

#define ARR_SIZE 1000000

using std::string;

void printtime (struct timeval& start, struct timeval& end, string str);   

int main (void)
{
  int arr[ARR_SIZE];
  int tmp;
  struct timeval start, stop;

  srand (time (NULL));

  /* Writing data to array */
  gettimeofday (&start, NULL);
  for (int i = 0; i < ARR_SIZE; i++)
  {
    arr[i] = rand();
  }
  gettimeofday (&stop, NULL);
  printtime (start, stop, string ("Time taken to write to array."));

  /* Reading data from array */
  gettimeofday (&start, NULL);
  for (int i = 0; i < ARR_SIZE; i++)
  {
    tmp = arr[i];
  }
  gettimeofday (&stop, NULL);
  printtime (start, stop, string ("Time taken to read from array sequentially."));

  /* Reading data from array randomly*/
  gettimeofday (&start, NULL);
  for (int i = 0; i < ARR_SIZE; i++)
  {
    tmp = arr[rand() % ARR_SIZE];
  }
  gettimeofday (&stop, NULL);
  printtime (start, stop, string ("Time taken to read from array randomly."));


  int *darr = (int *) calloc (sizeof (int), ARR_SIZE);  

  /* Writing data to array */
  gettimeofday (&start, NULL);
  for (int i = 0; i < ARR_SIZE; i++)
  {
    darr[i] = rand();
  }
  gettimeofday (&stop, NULL);
  printtime (start, stop, string ("Time taken to write to dynamic array."));

  /* Reading data from array */
  gettimeofday (&start, NULL);
  for (int i = 0; i < ARR_SIZE; i++)
  {
    tmp = darr[i];
  }
  gettimeofday (&stop, NULL);
  printtime (start, stop, string ("Time taken to read from dynamic array sequentially."));

  /* Reading data from dynamic array randomly*/
  gettimeofday (&start, NULL);
  for (int i = 0; i < ARR_SIZE; i++)
  {
    tmp = darr[rand() % ARR_SIZE];
  }
  gettimeofday (&stop, NULL);
  printtime (start, stop, string ("Time taken to read from dynamic array randomly."));

  std::vector<int> v(ARR_SIZE);
  assert (v.capacity() == ARR_SIZE);

  /* Writing to vector using indices*/
  gettimeofday (&start, NULL);
  for (int i = 0; i < ARR_SIZE; i++)
  {
    v[i] = rand();
  }
  gettimeofday (&stop, NULL);
  printtime (start, stop, string ("Time taken to write to vector using indices."));
  assert (v.capacity() == ARR_SIZE);

  /* Reading from vector using indices*/
  gettimeofday (&start, NULL);
  for (int i = 0; i < ARR_SIZE; i++)
  {
    tmp = v[i];
  }
  gettimeofday (&stop, NULL);
  printtime (start, stop, string ("Time taken to read from vector using indices, sequentially."));

  /* Reading data from dynamic array randomly*/
  gettimeofday (&start, NULL);
  for (int i = 0; i < ARR_SIZE; i++)
  {
    tmp = v[rand() % ARR_SIZE];
  }
  gettimeofday (&stop, NULL);
  printtime (start, stop, string ("Time taken to read from vector using indices, randomly."));

  std::vector<int> v2(ARR_SIZE);

  /* Writing to vector using iterators*/
  gettimeofday (&start, NULL);
  std::vector<int>::iterator itr, itr_end;
  for (itr = v2.begin(), itr_end = v2.end(); itr != itr_end; itr++)
  {
    *itr = rand();
  }
  gettimeofday (&stop, NULL);
  printtime (start, stop, string ("Time taken to write to vector using iterators."));


  /* Reading from vector using iterators*/
  gettimeofday (&start, NULL);
  for (itr = v2.begin(), itr_end = v2.end(); itr != itr_end; itr++)
  {
    tmp = *itr;
  }
  gettimeofday (&stop, NULL);
  printtime (start, stop, string ("Time taken to read from vector using iterators."));

  return 0;
}

void printtime (struct timeval& start, struct timeval& end, string str)
{
  double start_time, end_time, diff;

  start_time = ((start.tv_sec) * 1000 + start.tv_usec/1000.0);
  end_time   = ((end.tv_sec) * 1000 + end.tv_usec/1000.0);
  diff = end_time - start_time;

  std::cout << str << " : " << diff << " ms" << std::endl;
}

EDIT

As suggested in comments, here is more information :-

Compiler :- g++ - 4.5.2
Flags :- none (i.e. default values)
Optimizations :- None (I wanted to test the behavior in a usual setting. Optimization could vary the behavior of the program, for instance since the variable tmp is never used, the step of reading the vector/array may altogether be skipped or reduced to just the last assignment. At least that's what I understand).

Does anything change if you perform the tests in random order? — Matt K, Jun 04 '12 at 20:13
When discussing performance, post 1. the code and 2. the compiler options. There are too many variables at play to start guessing, we need that info at a minimum. We don't even know if your test is valid or has a large enough sample size to rule out anomalies. — Ed S., Jun 04 '12 at 20:13
Are you compiling in release mode? With all optimizations on? Running without a debugger attached? — Asik, Jun 04 '12 at 20:14
@Ed, the code is linked. I realize some people prefer it inline. — Matthew Flaschen, Jun 04 '12 at 20:16
You should consider moving the `rand()` function outside of the timing "window". — jedwards, Jun 04 '12 at 20:17
@MatthewFlaschen: I saw the code (inline is nice though), but I really was just stating the two things that need to be in your post *at a minimum*. The OP is currently 1 for 2. — Ed S., Jun 04 '12 at 20:18
Look at the assembly code generated by the compiler and find the different ways that the loops are implemented. — JohnPS, Jun 04 '12 at 20:21
You missed "{read,write} to {array,dynamic array} using pointers." — Robᵩ, Jun 04 '12 at 20:29
The time required to access the data will be highly dependent on if the data is in the (faster) processor cache memory or not. I would try to minimize the chance of the data not being in the cache. I.E. Ignore the first iteration. It generally is not in cache and causes it to loaded. A smaller data set with more iterations may minimize the chance of it being partially in the cache. — Jay, Jun 04 '12 at 20:34
@Rob Are you talking about this https://github.com/rajatkhanduja/Benchmarks/blob/master/C/Pointers/arrayIndexVsPointer.c ? Although, I had tested it for C (gcc) earlier, I didn't consider it necessary here. I was convinced of the faster method among those two. — rajatkhanduja, Jun 04 '12 at 20:58
Absolutely meaningless to benchmark (that is, *test the speed of*) code that you don't optimize for speed. Would you compare the top speed of race cars if they were required to stay in 1st gear? Closing as not constructive. — GManNickG, Jun 04 '12 at 20:58
"*Optimizations :- None*" That makes this question 100% pointless. — ildjarn, Jun 04 '12 at 20:59
*"Optimizations :- None (I wanted to test the behavior in a usual setting."* - A "usual" setting would have optimizations on. As others have said, testing performance without optimizations is completely pointless. — Ed S., Jun 04 '12 at 21:02
@rajatkhanduja - No, more like `for(int *beg=arr,*end=arr+ARR_SIZE; beg < end; beg++) { *beg = rand(); }`. — Robᵩ, Jun 04 '12 at 21:02
4.09106 / 2.85107 < 2. With times that small, there are a number of reasons not related to the code for this discrepancy. You probably want to run the test several times. — JohnMcG, Jun 04 '12 at 21:07
I might be wrong here, and considering the vehement response, I think I most certainly am, but I skipped the 'compiler' optimization because it would involve the compiler making some decisions that I don't intend to. For instance, as stefaanv pointed out, the compiler could optimize the loop itself leaving the test useless. That was my idea of first testing and understanding the outcome without any compiler optimization. — rajatkhanduja, Jun 04 '12 at 21:11
@rajatkhanduja - The flaw in the thinking is to time the code to see how fast it is, without asking the compiler to try to *make* it fast. This has been compared to timing Usain Bolt when he is *walking* 100 meters. It is not interesting unless you ask him to run. — Bo Persson, Jun 04 '12 at 21:54
yeah, i was surprised & worried until i saw no `-O3`. java's faster than unoptimized c++ lol. however, this is an awesome post! please redo the random index reads with `-O3`!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! — , Jul 05 '13 at 03:08

score 4 · Answer 1 · answered Jun 04 '12 at 20:25

Certainly not a definitive answer, but you are writing in a loop to a variable, meaning that a compiler can easily guess what the end result should be for sequential reading, thus optimizing the loop away. Since it's obviously not doing this, I assume that there is no optimization which definitely doesn't favor the iterator approach. The other numbers are too close to take conclusions.

C++ Array vs Vector performance test explanation

1 Answers1

Linked