I have set up a test program to compare array access performance to that of std::vector. I have found several similar questions but none seem to address my specific concern. I was scratching my head for some time over why array access seemed to be 6 times faster than vector access, when I have read in the past that they should be equivalent. As it turns out, this seems to be a function of the Intel compiler (v12) and optimization (occurs with anything above -O1), since I see better performance with std::vector when using gcc v4.1.2, and array has only a 2x advantage with gcc v4.4.4. I am running the tests on a RHEL 5.8 machine with Xeon X5355 cores. As an aside, I have found iterators to be faster than element access.
I am compiling with the following commands:
icpc -fast test.cc
g++44 -O3 test.cc
Can anyone explain the dramatic improvement in speed?
#include <vector>
#include <iostream>
using namespace std;
int main() {
int sz = 100;
clock_t start,stop;
int ncycle=1000;
float temp = 1.1;
// Set up and initialize vector
vector< vector< vector<float> > > A(sz, vector< vector<float> >(sz, vector<float>(sz, 1.0)));
// Set up and initialize array
float*** a = new float**[sz];
for( int i=0; i<sz; ++i) {
a[i] = new float*[sz];
for( int j=0; j<sz; ++j) {
a[i][j] = new float[sz]();
for( int k=0; k<sz; ++k)
a[i][j][k] = 1.0;
}
}
// Time the array
start = clock();
for( int n=0; n<ncycle; ++n )
for( int i=0; i<sz; ++i )
for( int j=0; j<sz; ++j )
for( int k=0; k<sz; ++k )
a[i][j][k] *= temp;
stop = clock();
std::cout << "STD ARRAY: " << double((stop - start)) / CLOCKS_PER_SEC << " seconds" << std::endl;
// Time the vector
start = clock();
/*
*/
for( int n=0; n < ncycle; ++n )
for (vector<vector<vector<float> > >::iterator it1 = A.begin(); it1 != A.end(); ++it1)
for (vector<vector<float> >::iterator it2 = it1->begin(); it2 != it1->end(); ++it2)
for (vector<float>::iterator it3 =it2->begin(); it3 != it2->end(); ++it3)
*it3 *= temp;
/*
for( int n=0; n < ncycle; ++n )
for( int i=0; i < sz; ++i )
for( int j=0; j < sz; ++j )
for( int k=0; k < sz; ++k )
A[i][j][k] *= temp;
*/
stop = clock();
std::cout << "VECTOR: " << double((stop - start)) / CLOCKS_PER_SEC << " seconds" << std::endl;
for( int i=0; i<100; ++i) {
for( int j=0; j<100; ++j)
delete[] a[i][j];
}
for( int i=0; i<100; ++i) {
delete[] a[i];
}
delete[] a;
return 0;
}
SOLVED
After noting Bo's indication that the compiler "knows everything" about the loop and can therefore optimize it more than the vector case, I replaced the multiplications by "temp" with multiplications by a call to "rand()". This leveled the playing field and in fact seems to give std::vector a slight lead. Timing of various scenarios are as follows:
ARRAY (flat): 111.15 seconds
ARRAY (flat): 0.011115 seconds per cycle
ARRAY (3d): 111.73 seconds
ARRAY (3d): 0.011173 seconds per cycle
VECTOR (flat): 110.51 seconds
VECTOR (flat): 0.011051 seconds per cycle
VECTOR (3d): 118.05 seconds
VECTOR (3d): 0.011805 seconds per cycle
VECTOR (flat iterator): 108.55 seconds
VECTOR (flat iterator): 0.010855 seconds per cycle
VECTOR (3d iterator): 111.93 seconds
VECTOR (3d iterator): 0.011193 seconds per cycle
The takeaway seems to be that vectors are just as fast as arrays, and slightly faster when flattened (contiguous memory) and used with iterators. My experiment only averaged over 10,000 iterations, so it could be argued that these are all roughly equivalent and the choice of which to use should be determined by whichever is easiest to use; in my case, that would be the "3d iterator" case.