42

During program optimization, trying to optimize a loop that iterates through a vector, I found the following fact: ::std::vector::at() is EXTREMELY slower than operator[] !

The operator[] is 5 to 10 times faster than at(), both in release & debug builds (VS2008 x86).

Reading a bit on the web got me to realize that at() has boundary checking. Ok, but, slowing the operation by up to 10 times?!

Is there any reason for that? I mean, boundary checking is a simple number comparison, or am I missing something?
The question is what is the real reason for this performance hit?
Further more, is there any way to make it even faster?

I'm certainly going to swap all my at() calls with [] in other code parts (in which I already have custom boundary check!).

Proof of concept:

#define _WIN32_WINNT 0x0400
#define WIN32_LEAN_AND_MEAN
#include <windows.h>

#include <conio.h>

#include <vector>

#define ELEMENTS_IN_VECTOR  1000000

int main()
{
    __int64 freq, start, end, diff_Result;
    if(!::QueryPerformanceFrequency((LARGE_INTEGER*)&freq))
        throw "Not supported!";
    freq /= 1000000; // microseconds!

    ::std::vector<int> vec;
    vec.reserve(ELEMENTS_IN_VECTOR);
    for(int i = 0; i < ELEMENTS_IN_VECTOR; i++)
        vec.push_back(i);

    int xyz = 0;

    printf("Press any key to start!");
    _getch();
    printf(" Running speed test..\n");

    { // at()
        ::QueryPerformanceCounter((LARGE_INTEGER*)&start);
        for(int i = 0; i < ELEMENTS_IN_VECTOR; i++)
            xyz += vec.at(i);
        ::QueryPerformanceCounter((LARGE_INTEGER*)&end);
        diff_Result = (end - start) / freq;
    }
    printf("Result\t\t: %u\n\n", diff_Result);

    printf("Press any key to start!");
    _getch();
    printf(" Running speed test..\n");

    { // operator []
        ::QueryPerformanceCounter((LARGE_INTEGER*)&start);
        for(int i = 0; i < ELEMENTS_IN_VECTOR; i++)
            xyz -= vec[i];
        ::QueryPerformanceCounter((LARGE_INTEGER*)&end);
        diff_Result = (end - start) / freq;
    }

    printf("Result\t\t: %u\n", diff_Result);
    _getch();
    return xyz;
}

Edit:
Now the value is being assiged to "xyz", so the compiler will not "wipe" it out.

m0meni
  • 16,006
  • 16
  • 82
  • 141
Poni
  • 11,061
  • 25
  • 80
  • 121
  • Perhaps you should actually do something with the elements instead of just requesting them or the compiler could optimize it away. – schnaader Jul 17 '10 at 01:26
  • Didn't understand you. Explain? – Poni Jul 17 '10 at 01:28
  • 1
    Try doing something like `test_int += vec[i]` in the for loops. As you're not doing anything with the vector element, the compiler could optimize this away completely. Also see Ben's answer for this. – schnaader Jul 17 '10 at 01:31
  • Note to all - POC has been changed so now the vector's values are being used (assigned to "xyz"). – Poni Jul 17 '10 at 01:32
  • 2
    Note that even with your change, the optimizer will still win: it can see that you never use the value of `xyz` that is computed, so the operation is still optimized out. You can prevent it from doing so by returning `xyz` (it can't optimize out the return from `main()`). – James McNellis Jul 17 '10 at 01:37
  • 1
    @Poni: Even with your updated code, the points I make below still stand: when I run this program, I see no significant performance difference between the two loops. Perhaps you can give some test results that you have seen. – James McNellis Jul 17 '10 at 01:46
  • Debug: 2504607/107676 .. 2523199/117864 Release: 8268/4255 .. 14171/4215 – Poni Jul 17 '10 at 01:49
  • @IvanaGajic - See this question regading `at`. – jww Nov 07 '16 at 01:42

3 Answers3

59

The reason is that an unchecked access can probably be done with a single processor instruction. A checked access will also have to load the size from memory, compare it with the index, and (assuming it's in range) skip over a conditional branch to the error handler. There may be more faffing around to handle the possibility of throwing an exception. This will be many times slower, and this is precisely why you have both options.

If you can prove that the index is within range without a runtime check then use operator[]. Otherwise, use at(), or add your own check before access. operator[] should be more or less as fast as possible, but will explode messily if the index is invalid.

Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
32

I ran your test code on my machine:

In an unoptimized debug build, the difference between the two loops is insignificant.

In an optimized release build, the second for loop is optimized out entirely (the call to operator[] is likely inlined and the optimizer can see that the loop does nothing and can remove the whole loop).

If I change the body of the loops to do some actual work, e.g., vec.at(i)++; and vec[i]++;, respectively, the difference between the two loops is insignificant.

I don't see this five to tenfold performance difference that you see.

James McNellis
  • 348,265
  • 75
  • 913
  • 977
  • He's running the debug build with iterator debugging enabled. – Hans Passant Jul 17 '10 at 02:27
  • @Hans: My test was with the default settings, so iterator debugging was enabled in the debug build (however, I think this is the expected result--that they would perform roughly equally--since it enables bound checks for `op[]`). If I disable iterator debugging, it gives `op[]` a roughly twofold performance gain over `at()`. (When I posted the answer originally, I wasn't really focusing on the debug build performance, but you're right: iterator debugging can make a big difference in a debug build). – James McNellis Jul 17 '10 at 03:08
4

You don't do anything with the return value, so if the compiler inlines these functions it can optimize them away completely. Or perhaps it can optimize away the subscript ([]) version completely. Running without optimizations is useless from a performance measurement perspective, what you need is some simple but useful program to exercise the functions so they don't just get optimized away. For example you could shuffle the vector (randomly swap 50000 pairs of elements).

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720