std::vector::operator[]
should be rather efficient, however the compiler must be paranoid and for every call made to a function it must assume that the vector could have been moved somewhere else in memory.
For example in this code
for (int i=0,n=v.size(); i<n; i++)
{
total += v[i] + foo();
}
if the code of foo
isn't known in advance the compiler is forced to reload the address of vector start every time because the vector could have been reallocated as a consequence of code inside foo()
.
If you know for sure that the vector is not going to be moved in memory or reallocated then you can factor out this lookup operation with something like
double *vptr = &v[0]; // Address of first element
for (int i=0,n=v.size(); i<n; i++)
{
total += vptr[i] + foo();
}
with this approach one memory lookup operation can be saved (vptr
is likely to end up in a register for the whole loop).
Also another reason for inefficiency may be cache trashing. To see if this is the problem an easy trick is to just over-allocate your vectors by some uneven number of elements.
The reason is that because of how caching works if you have many vectors with e.g. 4096 elements all of them will happen to have the same low-order bits in the address and you may end up losing a lot of speed because of cache line invalidations.
For example this loop on my PC
std::vector<double> v1(n), v2(n), v3(n), v4(n), v5(n);
for (int i=0; i<1000000; i++)
for (int j=0; j<1000; j++)
{
v1[j] = v2[j] + v3[j];
v2[j] = v3[j] + v4[j];
v3[j] = v4[j] + v5[j];
v4[j] = v5[j] + v1[j];
v5[j] = v1[j] + v2[j];
}
executes in about 8.1 seconds if n == 8191
and in 3.2 seconds if n == 10000
. Note that the inner loop is always from 0 to 999, independently of the value of n
; what is different is just the memory address.
Depending on the processor/architecture I've observed even 10x slowdowns because of cache trashing.