12

This may seem frivolous to some of you, but which of the following 2 methods of iteration over a STL container is better? Why?

class Elem;
typedef vector<Elem> ElemVec;
ElemVec elemVec;

// Method 0
for (ElemVec::iterator i = elemVec.begin(); i != elemVec.end(); ++i)
{
    Elem& e = *i;
    // Do something
}

// Method 1
for (int i = 0; i < elemVec.size(); ++i)
{
    Elem& e = elemVec.at(i);
    // Do something
}

Method 0 seems like cleaner STL, but Method 1 achieves the same with lesser code. Simple iteration over a container is what appears all over the place in any source code. So, I'm inclined to pick Method 1 which seems to reduce visual clutter and code size.

PS: I know iterators can do much more than a simple index. But, please keep the reply/discussion focused on simple iteration over a container like shown above.

Ashwin Nanjappa
  • 76,204
  • 83
  • 211
  • 292

9 Answers9

16

The first version works with any container and so is more useful in template functions that take any container a s a parameter. It is also conceivably slightly more efficient, even for vectors.

The second version only works for vectors and other integer-indexed containers. It'd somewhat more idiomatic for those containers, will be easily understood by newcomers to C++, and is useful if you need to do something else with the index, which is not uncommon.

As usual, there is no simple "this one is better" answer, I'm afraid.

  • Thanks Neil. My code isn't usually working with templates but vectors whose type is known. Could you elaborate about why Method 0 is more efficient in your answer? – Ashwin Nanjappa Apr 04 '09 at 08:34
  • 1
    It may be slightly more efficient if the iterator is actually implemented directly as a pointer. Note use of words "may" and "slightly" - you need to measure to be sure. –  Apr 04 '09 at 08:39
  • Yes, indeed the iterator is nothing more than a pointer for most containers. But, how does that make the code faster? AFAIK even Method 1 ends up being pointer arithmetic, isn't it? – Ashwin Nanjappa Apr 04 '09 at 08:42
  • 1
    @ash yes, but there is the arithmetic to do (ptr+index) as well as (index++), not to mention that [] may be a function call if it has not been inlined. But like I said - you need to measure. –  Apr 04 '09 at 08:46
  • For the record I've never seen a measurable difference in performance. – Robert Gould Apr 04 '09 at 09:07
  • The indexed (at) method 0 will be considerably slower for a list, correct? If it's implemented as a linked list it's O(N). – Andrew Jaffe Apr 04 '09 at 09:14
  • @andrew correct - I keep forgetting list supports indexed access, as I can't imagine anyone being mad enough to use it :-) –  Apr 04 '09 at 09:29
  • @neil actually, I don't think list does implement at() or operator[], after checking the docs online. But even size() is O(N). – Andrew Jaffe Apr 04 '09 at 09:34
  • @andrew You are right - shows how often I use list i.e. never! –  Apr 04 '09 at 09:48
  • Yeah my no performance difference applies to vector versus vector only. Anyways I do use maps for performance oblivious code, in that case only option is method 0 – Robert Gould Apr 04 '09 at 12:06
8

If you don't mind a (very?) small loss of efficiency, i'd recommend using Boost.Foreach

BOOST_FOREACH( Elem& e, elemVec )
{
    // Your code
}
Benoît
  • 16,798
  • 8
  • 46
  • 66
6

Method 0 is faster and therefore recommended.

Method 1 uses size() which is allowed to be O(1), depending on the container and the stl implementation.

Stefan
  • 43,293
  • 10
  • 75
  • 117
  • Thanks Stefan, I had forgotten about size() :-) – Ashwin Nanjappa Apr 04 '09 at 09:00
  • 1
    size() should be O(1) in a standard compliant vector. And it is not less efficient than v.end() as it will probably be inlined. Efficiency here is just the same (no more than a couple of instructions difference per access) – David Rodríguez - dribeas Apr 04 '09 at 11:53
  • @DavidRodríguez-dribeas: Efficiency isn't usually the same because the first method requires an extra pointer load (i.e. loading the pointer to the data, before adding the offset). If you have any other code alongside this, the compiler can get confused about aliasing and avoid caching that pointer, making you issue twice as many loads from memory as you need. It's unlikely to happen with a trivial loop but those don't come up in practice. – user541686 Aug 26 '13 at 08:55
  • @Mehrdad: The caching of the `size()` is probably not the issue with the initial code (the comment was addressed to just the caching of `size()`). In the OP, access to the vector is through `at()`, which involves an extra branch in the code, and some extra code. – David Rodríguez - dribeas Aug 26 '13 at 12:52
  • @DavidRodríguez-dribeas: I said *the caching of the pointer*. `size()` is not a pointer. I was talking about `begin()` and `end()` -- using iterators/pointers is generally faster than indexing because it requires fewer loads. (`at()` is obviously slower but I was talking about regular, unchecked indexing.) – user541686 Aug 26 '13 at 17:53
5

The following method of iteration over a standard library container is best.

Use (and beyond)'s range-based for-loop with the auto specifier:

// Method 2
for (auto& e: elemVec)
{
    // Do something with e...
}

This is similar to your Method 0 but requires less typing, less maintenence and works with any container compatible with std::begin() and std::end(), including plain-old arrays.

johnsyweb
  • 136,902
  • 23
  • 188
  • 247
4

Some more advantages of method 0:

  • if you move from vector to another container the loop remains the same,
  • easy to move from iterator to const_iterator if you need,
  • when c++0x will arive, auto typing will reduce some of the code clutter.

The main disadvantage is that in many cases you scan two containers, in which case an index is cleaner than keeping two iterators.

jalf
  • 243,077
  • 51
  • 345
  • 550
David Lehavi
  • 1,186
  • 7
  • 16
2

Coincidentally I made a speed test recently (filling 10 * 1024 * 1024 ints with rand() ).
These are 3 runs, time in nano-seconds

vect[i] time      : 373611869  
vec.at(i) time    : 473297793  
*it = time        : 446818590  
arr[i] time       : 390357294  
*ptr time         : 356895778  

UPDATE : added stl-algorithm std::generate, which seems to run the fastest, because of special iterator-optimizing (VC++2008). time in micro-seconds.

vect[i] time      : 393951
vec.at(i) time    : 551387
*it = time        : 596080
generate = time   : 346591
arr[i] time       : 375432
*ptr time         : 334612

Conclusion : Use standard-algorithms, they might be faster than a explicit loop ! (and also good practice)

Update : the above times were in a I/O-bound situation, I did the same tests with a CPU-bound (iterate over a relatively short vector, which should fit in cache repeatedly, multiply each element by 2 and write back to vector)

//Visual Studio 2008 Express Edition
vect[i] time      : 1356811
vec.at(i) time    : 7760148
*it = time        : 4913112
for_each = time   : 455713
arr[i] time       : 446280
*ptr time         : 429595

//GCC
vect[i] time      : 431039
vec.at(i) time    : 2421283
*it = time        : 381400
for_each = time   : 380972
arr[i] time       : 363563
*ptr time         : 365971  

Interestingly iterators and operator[] is considerably slower in VC++ compared to for_each (which seems to degrade the iterators to pointers through some template-magic for performance).
In GCC access times are only worse for at(), which is normal, because it's the only range-checked function of the tests.

qwerty
  • 373
  • 2
  • 8
  • Barely any statistical difference. Anything about 10% won't have an impact when an actual program is actually in use unless this is in a freqiently used tight loop. A cache miss, and the time would be equal – Robert Gould Apr 04 '09 at 12:14
  • If you define #define _SECURE_SCL 0 #define _SECURE_SCL_THROWS 0 there will be no difference between pointer and iterator performance. – Vadim Ferderer Apr 05 '09 at 00:58
2

Method 0, for several reasons.

  • It better expresses your intent, which aids compiler optimizations as well as readability
  • Off-by-one errors are less likely
  • It works even if you replace the vector with a list, which doesn't have operator[]

Of course, the best solution will often be solution 2: One of the std algorithms. std::for_each, std::transform, std::copy or whatever else you need. This has some further advantages:

  • It expresses your intent even better, and allows some significant compiler optimizations (MS's secure SCL performs bounds checking on your methods 0 and 1, but will skip it on std algorithms)
  • It's less code (at the call site, at least. Of course you have to write a functor or something to replace the loop body, but at the use site, the code gets cleaned up quite a bit, which is probably where it matters most.

In general, avoid overspecifying your code. Specify exactly what you want done, and nothing else. The std algorithms are usually the way to go there, but even without them, if you don't need the index i, why have it? Use iterators instead, in that case.

jalf
  • 243,077
  • 51
  • 345
  • 550
1

It depends on which type of container. For a vector, it probably doesn't matter which you use. Method 0 has become more idiomatic, but their isn't a big difference, as everyone says.

If you decided to use a list, instead, method 1 would, in principle, be O(N), but in fact there is no list at() method, so you can't even do it that way. (So at some level your question stacks the deck.)

But that in itself is another argument for method 0: it uses the same syntax for different containers.

Andrew Jaffe
  • 26,554
  • 4
  • 50
  • 59
0

A possibility not considered above: depending on the details of "Do something", one can have method 0 and method 1 simultaneously, you don't have to choose:

for (auto i = elemVec.begin(), ii = 0; ii < elemVec.size(); ++i, ++ii)
{
    // Do something with either the iterator i or the index ii
}

This way, finding the index or accessing the corresponding member are both obtained with trivial complexity.