What is the most effective way of iterating a std::vector and why?

Question

In terms of space-time complexity which of the following is best way to iterate over a std::vector and why?

Way 1:

for(std::vector<T>::iterator it = v.begin(); it != v.end(); ++it) {
    /* std::cout << *it; ... */
}

Way 2:

for(std::vector<int>::size_type i = 0; i != v.size(); i++) {
    /* std::cout << v[i]; ... */
}

Way 3:

for(size_t i = 0; i != v.size(); i++) {
    /* std::cout << v[i]; ... */
}

Way 4:

for(auto const& value: a) {
     /* std::cout << value; ... */

Possible duplicate of [What are the complexity guarantees of the standard containers?](https://stackoverflow.com/questions/181693/what-are-the-complexity-guarantees-of-the-standard-containers) — Michael Chourdakis, Apr 10 '19 at 06:38
They are equivalent. Sometimes the compiler may even turn one into the other. — Marc Glisse, Apr 10 '19 at 06:40
They are equally **effective**. If you want to know which one is more **efficient**, consider changing the title of the question. — Marco13, Apr 10 '19 at 12:12
You are missing an alternative: NOT re-computing `v.size()` every single iteration... — Matthieu M., Apr 10 '19 at 12:51
You need to grok that C++ is an abstraction. Your computer does not execute these statements. You are describing the behaviour of a program, and all those descriptions are exactly the same. — Lightness Races in Orbit, Apr 11 '19 at 02:25
[Premature optimization is the root of all evil](http://c2.com/cgi/wiki?PrematureOptimization) — Barmar, Apr 16 '19 at 17:48
@Bathsheba Are you serious? Could you elaborate on that? I have found https://stackoverflow.com/questions/13130708/what-is-the-advantage-of-using-universal-references-in-range-based-for-loops, but that seems not to give convincing reasons for your way 5. — Benjamin Bihler, May 02 '19 at 08:56
@BenjaminBihler: I'm not sure I can say anything more than the excellent accepted answer. — Bathsheba, May 02 '19 at 08:58

lubgr · Answer 1 · 2019-04-10T07:19:48.787

40

First of all, Way 2 and Way 3 are identical in practically all standard library implementations.

Apart from that, the options you posted are almost equivalent. The only notable difference is that in Way 1 and Way 2/3, you rely on the compiler to optimize the call to v.end() and v.size() out. If that assumption is correct, there is no performance difference between the loops.

If it's not, Way 4 is the most efficient. Recall how a range based for loop expands to

{
   auto && __range = range_expression ;
   auto __begin = begin_expr ;
   auto __end = end_expr ;
   for ( ; __begin != __end; ++__begin) {
      range_declaration = *__begin;
      loop_statement
   }
}

The important part here is that this guarantees the end_expr to be evaluated only once. Also note that for the range based for loop to be the most efficient iteration, you must not change how the dereferencing of the iterator is handled, e.g.

for (auto value: a) { /* ... */ }

this copies each element of the vector into the loop variable value, which is likely to be slower than for (const auto& value : a), depending on the size of the elements in the vector.

Note that with the parallel algorithm facilities in C++17, you can also try out

#include <algorithm>
#include <execution>

std::for_each(std::par_unseq, a.cbegin(), a.cend(),
   [](const auto& e) { /* do stuff... */ });

but whether this is faster than an ordinary loop depends on may circumstantial details.

edited Apr 10 '19 at 07:19

answered Apr 10 '19 at 06:48

lubgr

37,368
3
66
117

1

Good luck finding those C++17 features in a compiler. Those features are virtually non existant. I wish the compilers would just implement the Intel STL version and be done with it already – bremen_matt Apr 10 '19 at 19:57
2

@bremen_matt AFAIK, GCC is really the odd man out here, and both clang and MSVC implement this already? – Krupip Apr 10 '19 at 20:45
The parallel versions? That is, `std::par_unseq` – bremen_matt Apr 11 '19 at 05:25
I don't know about that... I can't get a simple code to compile with either GCC head or Clang head: https://wandbox.org/permlink/cnCxFscI0SWV1UF0 – bremen_matt Apr 11 '19 at 05:32
One place I found tracking the progress is: https://en.cppreference.com/w/cpp/compiler_support – bremen_matt Apr 11 '19 at 05:34
I believe we are talking about `Standardization of Parallelism TS`. That link shows MSVC as the only one even having partial support. I can't test that right now. So could it be that you are running on WIndows? and that is why you think these features are already implemented? – bremen_matt Apr 11 '19 at 05:35
By the way, isn't it `std::execution::par_unseq`? – Marc Glisse Apr 12 '19 at 11:30
@bremen_matt Most likely wandbox is missing Intel TBB, which is a dependency of the new implementation in gcc-9. With a few minor fixes and enabling TBB, godbolt does compile your test. – Marc Glisse Apr 12 '19 at 11:33
This is not a complete answer though. This will only work on Intel processors. I know that TBB was open sourced recently, but it is still a mess on arm processors, like the Raspberry PI. In particular, you need TBB and the Intel STL (as I mentioned in my original comment). It is not at all clear to me how to do that on a non-Intel processor. – bremen_matt Apr 12 '19 at 11:35
I mention this for two reasons: 1) I work on a lot of ARM processors. 2) This fact seems quite relevant because we are currently seeing a shift to cloud-based ARM processors. – bremen_matt Apr 12 '19 at 11:37
For instance, have you compared the AWS Intel vs ARM instances? The ARM ones seem to be just as fast and are half the price. I think that this is the year you will see a lot of people in DevOps starting to push their companies toward ARM – bremen_matt Apr 12 '19 at 11:38
TBB was not "open sourced recently", it only changed from GPL to apache. Intel PSTL is included in gcc-9 (and is supposed to be included in libc++ soon I believe), and at least with debian I have TBB packages for arm, power, mips, etc. – Marc Glisse Apr 12 '19 at 11:52
My understanding was that the feature in question here requires PSTL, not just TBB. That seems to be the primary source of the disagreement at the moment. I am aware of TBB availability for ARM, but not PSTL – bremen_matt Apr 12 '19 at 21:57
I do apologize... I kept writing Intel STL when I meant Intel PSTL. That made no sense. – bremen_matt Apr 12 '19 at 21:58

score 14 · Answer 2 · edited Apr 10 '19 at 16:48

Prefer iterators over indices/keys.

While for vector or array there should be no difference between either form¹, it is a good habit to get into for other containers.

_{¹ As long as you use [] instead of .at() for accesssing by index, of course.}

Memorize the end-bound.

Recomputing the end-bound at each iteration is inefficient for two reasons:

In general: a local variable is not aliased, which is more optimizer-friendly.
On containers other than vector: computing the end/size could be a bit more expensive.

You can do so as a one-liner:

for (auto it = vec.begin(), end = vec.end(); it != end; ++it) { ... }

(This is an exception to the general prohibition on declaring a single variable at a time.)

Use the for-each loop form.

The for-each loop form will automatically:

Use iterators.
Memorize the end-bound.

Thus:

for (/*...*/ value : vec) { ... }

Take built-in types by values, other types by reference.

There is a non-obvious trade-off between taking an element by value and taking an element by reference:

Taking an element by reference avoids a copy, which can be an expensive operation.
Taking an element by value is more optimizer-friendly¹.

At the extremes, the choice should be obvious:

Built-in types (int, std::int64_t, void*, ...) should be taken by value.
Potentially allocating types (std::string, ...) should be taken by reference.

In the middle, or when faced with generic code, I would recommend starting with references: it's better to avoid a performance cliff than attempting to squeeze out the last cycle.

Thus, the general form is:

for (auto& element : vec) { ... }

And if you are dealing with a built-in:

for (int element : vec) { ... }

_{¹ This is a general principle of optimization, actually: local variables are friendlier than pointers/references because the optimizer knows all the potential aliases (or absence, thereof) of the local variable.}

Out of interest, why didn't you consider `for (auto&& element : vec)`? — Bathsheba, Apr 10 '19 at 13:38
@Bathsheba: I find it confusing. We *know* that it will be a reference, so why mark it as a universal reference? — Matthieu M., Apr 10 '19 at 14:27

score 9 · Answer 3 · edited Apr 10 '19 at 14:32

Addition to lubgr's answer:

Unless you discover via profiling the code in question to be a bottleneck, efficiency (which you probably meant instead of 'effectivity') shouldn't be your first concern, at least not on this level of code. Much more important are code readability and maintainability! So you should select the loop variant that reads best, which usually is way 4.

Indices can be useful if you have steps greater than 1 (whyever you would need to...):

for(size_t i = 0; i < v.size(); i += 2) { ... }

While += 2 per se is legal on iterators, too, you risk undefined behaviour at loop end if the vector has odd size because you increment past the one past the end position! (Generally spoken: If you increment by n, you get UB if size is not an exact multiple of n.) So you need additional code to catch this, while you don't with the index variant...

To add on why this should not be considered: either the compiler cannot optimize this and there is no point to use the tools provided by the Standard Library (, , etc.) if you expect a minimum of performance; either it does ([and it *does*](https://godbolt.org/z/j40-j3) and losing time and effort on this minor difference never matters **unless you pinpoint through a profiler** a bottleneck in your implementation. — YSC, Apr 10 '19 at 08:32

score 4 · Answer 4 · answered Apr 10 '19 at 09:25

The lazy answer: The complexities are equivalent.

The time complexity of all solutions is Θ(n).
The space complexity of all solutions is Θ(1).

The constant factors involved in the various solutions are implementation details. If you need numbers, you're probably best off benchmarking the different solutions on your particular target system.

It may help to store v.size() rsp. v.end(), although these are usually inlined, so such optimizations may not be needed, or performed automatically.

Note that indexing (without memoizing v.size()) is the only way to correctly deal with a loop body that may add additional elements (using push_back()). However, most use cases do not need this extra flexibility.

score 2 · Answer 5 · answered Apr 11 '19 at 01:00

Prefer method 4, std::for_each (if you really must), or method 5/6:

void method5(std::vector<float>& v) {
    for(std::vector<float>::iterator it = v.begin(), e = v.end(); it != e; ++it) {
        *it *= *it; 
    }
}
void method6(std::vector<float>& v) {
    auto ptr = v.data();
    for(std::size_t i = 0, n = v.size(); i != n; i++) {
        ptr[i] *= ptr[i]; 
    }
}

The first 3 methods can suffer from issues of pointer aliasing (as alluded to in previous answers), but are all equally bad. Given that it's possible another thread may be accessing the vector, most compilers will play it safe, and re-evaluate [] end() and size() in each iteration. This will prevent all SIMD optimisations.

You can see proof here:

https://godbolt.org/z/BchhmU

You'll notice that only 4/5/6 make use of the vmulps SIMD instructions, where as 1/2/3 only ever use the non-SIMD vmulss instructiuon.

Note: I'm using VC++ in the godbolt link because it demonstrates the problem nicely. The same problem does occur with gcc/clang, but it's not easy to demonstrate it with godbolt - you usually need to disassemble your DSO to see this happening.

score 1 · Answer 6 · answered Apr 10 '19 at 11:23

For completeness, I wanted to mention that your loop might want to change the size of the vector.

std::vector<int> v = get_some_data();
for (std::size_t i=0; i<v.size(); ++i)
{
    int x = some_function(v[i]);
    if(x) v.push_back(x);
}

In such an example you have to use indices and you have to re-evaluate v.size() in every iteration.

If you do the same with a range-based for loop or with iterators, you might end up with undefined behavior since adding new elements to a vector might invalidate your iterators.

By the way, I prefer to use while-loops for such cases over for-loops but that's another story.

An important supplemental answer because the OP's question specifically called out iterating over `std::vector` - as opposed to iterating over an arbitrary STL-ish container. — davidbak, Apr 10 '19 at 16:48

score 1 · Answer 7 · answered Apr 10 '19 at 14:55

It depends to a large extent on what you mean by "effective".

Other answers have mentioned efficiency, but I'm going to focus on the (IMO) most important purpose of C++ code: to convey your intent to other programmers¹.

From this perspective, method 4 is clearly the most effective. Not just because there are fewer characters to read, but mainly because there's less cognitive load: we don't need to check whether the bounds or step size are unusual, whether the loop iteration variable (i or it) is used or modified anywhere else, whether there's a typo or copy/paste error such as for (auto i = 0u; i < v1.size(); ++i) { std::cout << v2[i]; }, or dozens of other possibilities.

Quick quiz: Given std::vector<int> v1, v2, v3;, how many of the following loops are correct?

for (auto it = v1.cbegin();  it != v1.end();  ++it)
{
    std::cout << v1[i];
}

for (auto i = 0u;  i < v2.size();  ++i)
{
    std::cout << v1[i];
}

for (auto const i: v3)
{
    std::cout << i;
}

Expressing the loop control as clearly as possible allows the developer's mind to hold more understanding of the high-level logic, rather than being cluttered with implementation details - after all, this is why we're using C++ in the first place!

¹ To be clear, when I'm writing code, I consider the most important "other programmer" to be Future Me, trying to understand, "Who wrote this rubbish?"...

score 0 · Answer 8 · answered Apr 10 '19 at 06:48

All of the ways you listed have identical time complexity and identical space complexity (no surprise there).

Using the for(auto& value : v) syntax is marginally more efficient, because with the other methods, the compiler may re-load v.size() and v.end() from memory every time you do the test, whereas with for(auto& value : v) this never occurs (it only loads the begin() and end() iterators once).

We can observe a comparison of the assembly produced by each method here: https://godbolt.org/z/LnJF6p

On a somewhat funny note, the compiler implements method3 as a jmp instruction to method2.

*"On a somewhat funny note, the compiler implements method3 as a jmp instruction to method2."* It does that because the backend code generator knows that they are equivalent (in other words, they produce *exactly* the same machine code). — Cody Gray - on strike, Apr 10 '19 at 16:50

score 0 · Answer 9 · answered Apr 21 '19 at 21:20

The complexity is the same for all except the last one that is in theory faster because the end of the container is evaluated only once.

Last one is also the nicest to read and to write, but has the drawback that doesn't give you the index (that is quite often important).

You are however ignoring what I think is a good alternative (it's my preferred one when I need the index and cannot use for (auto& x : v) {...}):

for (int i=0,n=v.size(); i<n; i++) {
    ... use v[i] ...
}

note that I used int and not size_t and that the end is computed only once and also available in the body as a local variable.

Often when the index and the size are needed then math computations are also performed on them and size_t behaves "strangely" when used for math (for example a+1 < b and a < b-1 are different things).

What is the most effective way of iterating a std::vector and why?

9 Answers9

Prefer iterators over indices/keys.

Memorize the end-bound.

Use the for-each loop form.

Take built-in types by values, other types by reference.