6

I'm trying to optimize my C++ code. I've searched the internet on using dynamically allocated C++ arrays vs using std::vector and have generally seen a recommendation in favor of std::vector and that the difference in performance between the two is negligible. For instance here - Using arrays or std::vectors in C++, what's the performance gap?.

However, I wrote some code to test the performance of iterating through an array/vector and assigning values to the elements and I generally found that using dynamically allocated arrays was nearly 3 times faster than using vectors (I did specify a size for the vectors beforehand). I used g++-4.3.2.

However I feel that my test may have ignored issues I don't know about so I would appreciate any advice on this issue.

Thanks

Code used -

#include <time.h>
#include <iostream>
#include <vector>

using namespace std;

int main() {
  clock_t start,end;
  std::vector<int> vec(9999999);
  std::vector<int>::iterator vecIt = vec.begin();
  std::vector<int>::iterator vecEnd = vec.end();

  start = clock();
  for (int i = 0; vecIt != vecEnd; i++) {
    *(vecIt++) = i;
  }
  end = clock();
  cout<<"vector: "<<(double)(end-start)/CLOCKS_PER_SEC<<endl;

  int* arr = new int[9999999];
  start = clock();
  for (int i = 0; i < 9999999; i++) {
    arr[i] = i;
  }
  end = clock();
  cout<<"array: "<<(double)(end-start)/CLOCKS_PER_SEC<<endl;
}
Community
  • 1
  • 1
Opt
  • 4,774
  • 3
  • 28
  • 28
  • Yes, let's see your benchmark code. Often, performance problems with STL containers is due to usage. – Fred Larson Jul 01 '09 at 22:33
  • 5
    and make sure the benchmark is compiled with debugging off and optimisations on –  Jul 01 '09 at 22:38
  • I just ran your test code and got: $ ./array vector: 0.021851 array: 0.059796 So I'm seeing the vector version faster! – DaveR Jul 01 '09 at 22:38
  • @Sid - Use the little ones-and-zeros button when you post code so that it will be formatted correctly. – Fred Larson Jul 01 '09 at 22:39
  • 1
    The optimizations weren't on while compiling. std::vector is faster now. Thanks for the help. – Opt Jul 01 '09 at 22:40
  • 1
    @Neil, I think your comment is what should be the accepted answer. Maybe you should make it an answer so we can vote it up and Sid can accept it. 8v) – Fred Larson Jul 01 '09 at 22:43
  • On my machine (OS X 10.5.7 g++ 4.0.1) the above code gives primitive arrays are about 2.5 times slower than vectors. – wilhelmtell Jul 01 '09 at 22:45
  • Sid: There's definately something unfair about your test - see my post below - running each test 100 times shows they take basically the same time! – DaveR Jul 01 '09 at 23:17
  • @Sid: Ok; so the std::vector case already has it's data in the CPU cache, which is why it's 3x faster. See below for details. – DaveR Jul 01 '09 at 23:47
  • @Dave Rigby: Thanks for pointing out the flaw in the test. – Opt Jul 01 '09 at 23:50

10 Answers10

21

When benchmarking C++ comtainers, it's important to enable most compiler optimisations. Several of my own answers on SO have fallen foul of this - for example, the function call overhead when something like operator[] is not inlined can be very significant.

5

Just for fun, try iterating over the plain array using a pointer instead of an integer index (the code should look just like the vector iteration, since the point of STL iterators is to appear like pointer arithmetic for most operations). I bet the speed will be exactly equal in that case. Which of course means you should pick the vector, since it will save you a world of headaches from managing arrays by hand.

rmeador
  • 25,504
  • 18
  • 62
  • 103
5

The thing about the standard library classes such as std::vector is that yes, naively, it is a lot more code than a raw array. But all of it can be trivially inlined by the compiler, which means that if optimizations are enabled, it becomes essentially the same code as if you'd used a raw array. The speed difference then is not negligible but non-existent. All the overhead is removed at compile-time.

But that requires compiler optimizations to be enabled.

jalf
  • 243,077
  • 51
  • 345
  • 550
3

I imagine the reason why you found iterating and adding to std::vector 3 times slower than a plain array is a combination of the cost of iterating the vector and doing the assigment.

Edit:

That was my initial assumption before the testcase; however running the testcase (compiled with -O3) shows the converse - std::vector is actually 3 times faster, which surprised me.

I can't see how std::vector could be faster (certainly not 3 times faster) than a vanilla array copy - I think there's some optimisation being applied to the std::vector compiled code which isn't happening for the array version.

Original benchmark results:

$ ./array
array:  0.059375
vector: 0.021209
  • std::vector is 3x faster. Same benchmark again, except add an additional outer loop to run the test iterater loop 1000 times:

    $ ./array array: 21.7129 vector: 21.6413

  • std::vector is now ~ the same speed as array.

Edit 2

Found it! So the problem with your test case is that in the vector case the memory holding the data appears to be already in the CPU cache - either by the way it is initialised, or due to the call to vec.end(). If I 'warm' up the CPU cache before each timing test, I get the same numbers for array and vector:

#include <time.h>
#include <iostream>
#include <vector>

int main() {
  clock_t start,end;
  std::vector<int> vec(9999999);
  std::vector<int>::iterator vecIt = vec.begin();
  std::vector<int>::iterator vecEnd = vec.end();

  // get vec into CPU cache.
  for (int i = 0; vecIt != vecEnd; i++) { *(vecIt++) = i; }
  vecIt = vec.begin();
  start = clock();
  for (int i = 0; vecIt != vecEnd; i++) {
    *(vecIt++) = i;
  }
  end = clock();
  std::cout<<"vector: "<<(double)(end-start)/CLOCKS_PER_SEC<<std::endl;

  int* arr = new int[9999999];

  // get arr into CPU cache.
  for (int i = 0; i < 9999999; i++) { arr[i] = i; }
  start = clock();
  for (int i = 0; i < 9999999; i++) {
    arr[i] = i;
  }
  end = clock();
  std::cout<<"array: "<<(double)(end-start)/CLOCKS_PER_SEC<<std::endl;
}

This gives me the following result:

$ ./array
vector: 0.020875
array: 0.020695
DaveR
  • 9,540
  • 3
  • 39
  • 58
  • it is also not safe. It doesn't do bounds checking by default. – jalf Jul 01 '09 at 22:49
  • Maybe it's caching the array addresses in the CPU register which is why further runs are much faster? I tried looping 2 times instead of 1000 and while the first run took 0.06s, the second run took 0.25s – Opt Jul 01 '09 at 23:26
  • Of course that doesn't explain why vector should run at optimal speed right from the start. – Opt Jul 01 '09 at 23:27
  • @jalf: Didn't realise that - shows much recently I used std::vector in anger! I've updated post to remove the mistake. – DaveR Jul 01 '09 at 23:54
1

I agree with rmeador,

  for (int i = 0; vecIt != vecEnd; i++) {
    *(vecIt++) = i; // <-- quick offset calculation
  }
  end = clock();
  cout<<"vector: "<<(double)(end-start)/CLOCKS_PER_SEC<<endl;

  int* arr = new int[9999999];
  start = clock();
  for (int i = 0; i < 9999999; i++) {
    arr[i] = i; // <-- not fair play :) - offset = arr + i*size(int)
  }
Nick Dandoulakis
  • 42,588
  • 16
  • 104
  • 136
1

I think the answer here is obvious: it doesn't matter. Like jalf said the code will end up being about the same, but even if it wasn't, look at the numbers. The code you posted creates a huge array of 10 MILLION items, yet iterating over the entire array takes only a few hundredths of a second.

Even if your application really is working with that much data, whatever it is you're actually doing with that data is likely to take much more time than iterating over your array. Just use whichever data structure you prefer, and focus your time on the rest of your code.

To prove my point, here's the code with one change: the assignment of i to the array item is replaced with an assignment of sqrt(i). On my machine using -O2, the execution time triples from .02 to .06 seconds.

#include <time.h>
#include <iostream>
#include <vector>
#include <math.h>

using namespace std;

int main() {
  clock_t start,end;
  std::vector<int> vec(9999999);
  std::vector<int>::iterator vecIt = vec.begin();
  std::vector<int>::iterator vecEnd = vec.end();

  start = clock();
  for (int i = 0; vecIt != vecEnd; i++) {
    *(vecIt++) = sqrt(i);
  }
  end = clock();
  cout<<"vector: "<<(double)(end-start)/CLOCKS_PER_SEC<<endl;

  int* arr = new int[9999999];
  start = clock();
  for (int i = 0; i < 9999999; i++) {
    arr[i] = i;
  }
  end = clock();
  cout<<"array: "<<(double)(end-start)/CLOCKS_PER_SEC<<endl;
}
orangejulius
  • 989
  • 1
  • 10
  • 23
0

The issue seems to be that you compiled your code with optimizations turned off. On my machine, OS X 10.5.7 with g++ 4.0.1 I actually see that the vector is faster than primitive arrays by a factor of 2.5.

With gcc try to pass -O2 to the compiler and see if there's any improvement.

wilhelmtell
  • 57,473
  • 20
  • 96
  • 131
0

The reason that your array iterating is faster is that the the number of iteration is constant, and compiler is able to unroll the loop. Try to use rand to generate a number, and multiple it to be a big number you wanted so that compiler wont be able to figure it out at compile time. Then try it again, you will see similar runtime results.

leiz
  • 3,984
  • 2
  • 23
  • 17
0

One reason you're code might not be performing quite the same is because on your std::vector version, you are incrimenting two values, the integer i and the std::vector::iterator vecIt. To really be equivalent, you could refactor to

start = clock();
for (int i = 0; i < vec.size(); i++) {
  vec[i] = i;
}
end = clock();
cout<<"vector: "<<(double)(end-start)/CLOCKS_PER_SEC<<endl;
SingleNegationElimination
  • 151,563
  • 33
  • 264
  • 304
0

Your code provides an unfair comparison between the two cases since you're doing far more work in the vector test than the array test.

With the vector, you're incrementing both the iterator (vecIT) and a separate variable (i) for generating the assignment values.

With the array, you're only incrementing the variable i and using it for dual purpose.

csj
  • 21,818
  • 2
  • 20
  • 26