Speed of accessing a vector by index: backwards vs forward

Question

Due to caching effects the forward iteration of a vector will be faster than (e.g.) random access:

for (unsigned i=0; i<vec.size(); i++) {
    vec[i] = something(i); 
}

Now I need to iterate it backwards:

for (unsigned i=vec.size(); i-->0; ) {
    vec[i] = something(i);
}

On my system there seems to be no speed difference. Can I assume that the same caching effects apply here and therefore the loops will have the same speed on most systems?

You can't make general assumptions about cache effects on "any system." Every CPU architecture may behave much differently from one another. Even subsequent generations of CPU architectures might have drastically different behavior in some cases. I'm guessing you tested on a contemporary x86 processor, whose hardware prefetching is good at handling sequential accesses in either direction. — Jason R, Apr 09 '15 at 14:45
While you can't make any guarantees, you can make the assumption based on current technology that a forwards and a backwards loop through linear memory should have similar cache prefetching behaviour. — Jonatan Hedborg, Apr 09 '15 at 14:52
@James ? that is the point of the second loop. I need to read it off the end and wonder whether it will affect speed — BlueSun, Apr 09 '15 at 14:52
No, I mean... it reads over the bounds of the vector. You need to start with size() - 1. — James M, Apr 09 '15 at 14:52
@JamesMcLaughlin The second loop is correct because `i-->0` decreases i before the loop starts. — user4098326, Apr 09 '15 at 14:54
@James no it does not. i is decreased by 1 before the loop starts. — BlueSun, Apr 09 '15 at 14:55
the only thing could be that on some architectures decreasing loop (which compares to 0) is faster (in terms of comparing speed). — zoska, Apr 09 '15 at 14:59
[Two hard things](http://martinfowler.com/bliki/TwoHardThings.html) — Drew Dormann, Apr 09 '15 at 15:14
@BlueSun Ah yes, I see. That's a pretty ugly use of the loop condition, considering there's a third section of the for loop specifically designed for this which you just omit. — James M, Apr 09 '15 at 15:58
@James that is actually the better way to do this (if index iterating is needed). How else are you going to run the loop until 0? The condition for (unsigned i=size-1; i >= 0; i--) makes no sense for unsigned i which will always be >=0. Look here for more information: http://stackoverflow.com/questions/4205720/iterating-over-a-vector-in-reverse-direction — BlueSun, Apr 09 '15 at 16:16
Use iterators and reverse iterators. That would have prevented your out of bounds access. — Neil Kirk, Apr 09 '15 at 16:20
@NeilKirk That was the first thing i have tried. However I found that std Iterators are slower than manually iterating with an integer. And since ca. 90% my code consist of such loops, it had impact on overall performance. — BlueSun, Apr 09 '15 at 16:58
Shouldn't be in an optimized build. Plus in debug you get checks for out of bounds on good debuggers, finding lots of bugs. — Neil Kirk, Apr 09 '15 at 17:00
@NeilKirk it was -O3 optimized. And like I've explained before: there is/was no out of bounds access in that code — BlueSun, Apr 09 '15 at 17:07

score 8 · Accepted Answer · edited May 23 '17 at 10:24

Making assumptions about hardware is almost always a poor choice - unless you have a specific subset of hardware in mind. If you intend your code to run near-exclusively on modern x86 hardware (windows 8 desktop machines for example) then it is fair to assume that this reverse iteration will be equivalent in speed (as reverse_iterators generally are)

However, if your aiming for mobile technologies, embedded systems, or older systems there is a much greater degree of variance between what can be expected.

Personally, I've spend a good four years developing optimised software commercially for desktop machines - In that time I have never needed to optimise a vector iteration.

If you find yourself profiling your software on a platform and find that there is a need for optimisation in this regard - you may be better allocating an array and manually iterating over the elements. However more often than not, such optimisations are overzealous early in your development.

You could, I have no doubt, create your own iterator class, vector, or array class to provide faster iterations in your specific case. But doing so will complicate your code greatly - something that will eventually cause issues when it comes to debugging your code.

“Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” ~ Brian Kernighan

Personally I would recommend Unless you have an example of a targeted platform-set you intend on deploying into that is abnormal, or have a specific scenario where you can produce evidence of a speed difference. Assume the Standard Library Vector is appropriately fast.

That said, depending on the workload you wish to preform, you may find a std::list more appropriate if you have very large amounts of data; see the following link: Benchmarks of STL containers

Thank you for your answer. Since I am using exclusively clusters with modern x86 hardware, I will assume that the iteration order does not matter. — BlueSun, Apr 09 '15 at 16:51

Speed of accessing a vector by index: backwards vs forward

1 Answers1