vector::operator[] overhead

Question

Apparently, after profiling my (scientific computation) C++ code, 25% (!) of the time is spent with calls to vector::operator[]. True, my code spends all of its time reading and writing in vector<float>s (and a few vector<int>s too), but still, I'd like to know if there's supposed to be some significant overhead of operator[] compared to C-style arrays?

(I've seen another related question on SO, but regarding [] vs at() -- but apparently even [] is too slow for me?!)

Thanks, Antony

(edit: just for info: using g++ -O3 version 4.5.2 on Ubuntu)

Hope you made sure to turn optimizations on :) Would be good to know compiler & OS too. — Skurmedel, May 24 '11 at 12:27
I sincerely doubt that this is related to any *overhead* of calling `[]`, it's probably purely down the *number* of times you are accessing the vectors. May be look to tune your algorithm to reduce the number of times you need to access? — Nim, May 24 '11 at 12:33
What's your STL implementation? And what are you profiling with? Typically operator[] will have practically no discernible overhead over C-array subscripting in an optimized build. The act of measuring could be inhibiting optimizations (depending how it's done) or you could be using an STL with agressive error checking (Microsoft's in some configurations/versions for example). — Logan Capaldo, May 24 '11 at 12:35
Relatively small vectors (a few thousand elements at most, often only a few dozen), with mostly random access. — antony, May 24 '11 at 13:29

Konrad Rudolph · Answer 1 · 2011-05-24T12:44:34.733

12

In a modern compiler, in release mode, with optimisations enabled, there is no overhead in using operator [] compared to raw pointers: the call is completely inlined and resolves to just a pointer access.

~~I’m guessing that you are somehow copying the return value in the assignment and that this is causing the real 25% time spent in the instruction.~~[Not relevant for float and int]

Or the rest of your code is simply blazingly fast.

edited May 24 '11 at 12:44

answered May 24 '11 at 12:30

Konrad Rudolph

530,221
131
937
1,214

This is of course assuming that you’re not using some esoteric standard library implementation with added runtime checks. – Konrad Rudolph May 24 '11 at 12:32
I think the OP mentioned something about `float` and `int`, doubt the copy could be the issue here(?) – Nim May 24 '11 at 12:35

CB Bailey · Answer 2 · 2011-05-24T12:42:33.820

Yes, there will be some overhead as typically a vector will contain a pointer to a dynamically allocated array where as an array is just "there". This means there will typically be an extra memory dereference in vector::operator[] over using [] on an array. (Note that if you have a pointer to an array this is typically no better than a vector.)

If you are performing multiple accesses through the same vector or pointer in the same section of code without causing the vector to perhaps need reallocating, then the cost of this extra dereference may be shared over the multiple accesses and may well be negligible.

E.g.

#include <vector>
extern std::vector<float> vf;
extern float af[];
extern float* pf;

float test1(long index)
{
        return vf[index];
}

float test2(long index)
{
        return af[index];
}

float test3(long index)
{
        return pf[index];
}

generates the following code for me on g++ (some guff trimmed):

.globl _Z5test1i
        .type   _Z5test1i, @function
_Z5test1i:
        movq    vf(%rip), %rax
        movss   (%rax,%rdi,4), %xmm0
        ret
        .size   _Z5test1i, .-_Z5test1i

.globl _Z5test2i
        .type   _Z5test2i, @function
_Z5test2i:
        movss   af(,%rdi,4), %xmm0
        ret
        .size   _Z5test2i, .-_Z5test2i

.globl _Z5test3i
        .type   _Z5test3i, @function
_Z5test3i:
        movq    pf(%rip), %rax
        movss   (%rax,%rdi,4), %xmm0
        ret
        .size   _Z5test3i, .-_Z5test3i

Note how the pointer and vector version produce exactly the same code, with only the array version "winning".

score 8 · Answer 3 · answered May 24 '11 at 12:56

8

In general, there should be no signficant difference. Differences can occur in practice, however, for various reasons, depending on how the compiler optimizes a specific bit of code. One significant possible difference: you're profiling, which means that you're executing instrumented code. I don't know what profiler you're using, but it's frequent for the compiler to turn off inlining for various reasons when instrumenting for profiling. Are you sure that this isn't the case here, and that this is artificially causing the indexation to appear to take a greater percent of time than if it were inlined.

answered May 24 '11 at 12:56

James Kanze

150,581
18
184
329

+1 for the note about instrumented code. For fast code the only viable option is a passive statistical profiler (e.g. oprofile). Any form of instrumentation will make the measure pointless. – 6502 May 24 '11 at 13:09
1

I compiled with option -pg and used gprof. Perhaps that's the issue indeed. – antony May 24 '11 at 13:12

score 7 · Accepted Answer · answered May 24 '11 at 13:07

std::vector::operator[] should be rather efficient, however the compiler must be paranoid and for every call made to a function it must assume that the vector could have been moved somewhere else in memory.

For example in this code

for (int i=0,n=v.size(); i<n; i++)
{
    total += v[i] + foo();
}

if the code of foo isn't known in advance the compiler is forced to reload the address of vector start every time because the vector could have been reallocated as a consequence of code inside foo().

If you know for sure that the vector is not going to be moved in memory or reallocated then you can factor out this lookup operation with something like

double *vptr = &v[0]; // Address of first element
for (int i=0,n=v.size(); i<n; i++)
{
    total += vptr[i] + foo();
}

with this approach one memory lookup operation can be saved (vptr is likely to end up in a register for the whole loop).

Also another reason for inefficiency may be cache trashing. To see if this is the problem an easy trick is to just over-allocate your vectors by some uneven number of elements.

The reason is that because of how caching works if you have many vectors with e.g. 4096 elements all of them will happen to have the same low-order bits in the address and you may end up losing a lot of speed because of cache line invalidations. For example this loop on my PC

std::vector<double> v1(n), v2(n), v3(n), v4(n), v5(n);
for (int i=0; i<1000000; i++)
    for (int j=0; j<1000; j++)
    {
        v1[j] = v2[j] + v3[j];
        v2[j] = v3[j] + v4[j];
        v3[j] = v4[j] + v5[j];
        v4[j] = v5[j] + v1[j];
        v5[j] = v1[j] + v2[j];
    }

executes in about 8.1 seconds if n == 8191 and in 3.2 seconds if n == 10000. Note that the inner loop is always from 0 to 999, independently of the value of n; what is different is just the memory address.

Depending on the processor/architecture I've observed even 10x slowdowns because of cache trashing.

score 1 · Answer 5 · answered May 24 '11 at 12:31

1

A pure array access is an (almost) direct memory read, whereas operator[] is a member method of vector<>.

If properly inlined, it should be the same, if not, the overhead is very significant for computation intensive work.

answered May 24 '11 at 12:31

jv42

8,521
5
40
64

vector::operator[] overhead

5 Answers5

Linked