64

In terms of performance, what would work faster? Is there a difference? Is it platform dependent?

//1. Using vector<string>::iterator:
vector<string> vs = GetVector();

for(vector<string>::iterator it = vs.begin(); it != vs.end(); ++it)
{
   *it = "Am I faster?";
}

//2. Using size_t index:
for(size_t i = 0; i < vs.size(); ++i)
{
   //One option:
   vs.at(i) = "Am I faster?";
   //Another option:
   vs[i] = "Am I faster?";
}
unwind
  • 391,730
  • 64
  • 469
  • 606
Gal Goldman
  • 8,641
  • 11
  • 45
  • 45
  • 10
    I have been doing benchmarks myself, and vector.at is much slower than using an iterator, however using vector[i] is much faster than using an iterator. However, you can make the loop even faster by grabbing the pointer to the first element and looping while the current pointer is less than or equal to the pointer of the last element; similar to iterators, but less overhead and is consequently not as nice to look at code-wise. This test was done on Windows with Visual Studio 2008. Concerning your question, I do believe that's platform dependent, it depends on the implementation. – leetNightshade Feb 27 '12 at 17:35
  • 1
    However, continuing from my off topic point about iterating the pointers yourself, should always be faster no matter the platform. – leetNightshade Feb 27 '12 at 17:36
  • 1
    @leetNightshade: Certain compilers, when running into subscripts instead of a pointer arithmetics, could use SIMD instructions, which would make it faster. –  Apr 13 '13 at 03:14
  • 2
    You are instantiating the end iterator every time you loop, and iterator instantiation aren't free. Try caching your end iterator. Try this: `for(vector::iterator it = v.begin(), end= v.end(); it != end; ++it) { ... }` – mchiasson Mar 15 '15 at 12:40

16 Answers16

45

Using an iterator results in incrementing a pointer (for incrementing) and for dereferencing into dereferencing a pointer.
With an index, incrementing should be equally fast, but looking up an element involves an addition (data pointer+index) and dereferencing that pointer, but the difference should be marginal.
at() also checks if the index is within the bounds, so it could be slower.

Benchmark results for 500M iterations, vector size 10, with gcc 4.3.3 (-O3), linux 2.6.29.1 x86_64:
at(): 9158ms
operator[]: 4269ms
iterator: 3914ms

YMMV, but if using an index makes the code more readable/understandable, you should do it.

2021 update

With modern compilers, all options are practically free, but iterators are very slightly better for iterating and easier to use with range-for loops (for(auto& x: vs)).

Code:

#include <vector>

void iter(std::vector<int> &vs) {
    for(std::vector<int>::iterator it = vs.begin(); it != vs.end(); ++it)
        *it = 5;
}

void index(std::vector<int> &vs) {
    for(std::size_t i = 0; i < vs.size(); ++i)
        vs[i] = 5;
}

void at(std::vector<int> &vs) {
    for(std::size_t i = 0; i < vs.size(); ++i)
        vs.at(i) = 5;
}

The generated assembly for index() and at() is identical ([godbolt])(https://godbolt.org/z/cv6Kv4b6f), but the loop setup for iter() is three instructions shorter:

iter(std::vector<int, std::allocator<int> >&):
        mov     rax, QWORD PTR [rdi]
        mov     rdx, QWORD PTR [rdi+8]
        cmp     rax, rdx
        je      .L1
.L3:                              ; loop body
        mov     DWORD PTR [rax], 5
        add     rax, 4
        cmp     rax, rdx
        jne     .L3
.L1:
        ret
index(std::vector<int, std::allocator<int> >&):
        mov     rax, QWORD PTR [rdi]
        mov     rdx, QWORD PTR [rdi+8]
        sub     rdx, rax
        mov     rcx, rdx
        shr     rcx, 2
        je      .L6
        add     rdx, rax
.L8:                              ; loop body
        mov     DWORD PTR [rax], 5
        add     rax, 4
        cmp     rdx, rax
        jne     .L8
.L6:
        ret
tstenner
  • 10,080
  • 10
  • 57
  • 92
  • Which OS and compiler were the profiled results from? Which implementation of STL were they using? Were the results made with or without optimizations turned on? Be careful, all of this may change the results. To be sure you should profile your own code in your own environment. – Brian R. Bondy Apr 22 '09 at 11:11
  • 5
    -1 sorry. If you look here: http://www.velocityreviews.com/forums/showpost.php?p=1502464&postcount=13, you'll see that this guy **didn't use any compiler optimisation flags**, so the results are essentially meaningless. – j_random_hacker Apr 22 '09 at 11:12
  • 1
    -1 Agree with j_random_hacker - if you read the thread all the way through, there's some interesting stuff about the pitfalls of profiling, and also some more reliable results. – James Hopkin Apr 22 '09 at 11:14
  • 1
    -1, indeed. Quoting numbers without understanding them seems to be a trap that got both tstennner and the bencmarker. – MSalters Apr 22 '09 at 11:15
  • 2
    +2 now that you've updated with more sensible measuring criteria :) – j_random_hacker Apr 22 '09 at 16:55
  • A natural question: why is at() so much slower than operator[]? They are essentially synonyms, so why aren't they implemented identically? – Michael Mar 16 '15 at 20:33
  • 4
    @Michael `at()` performs bounds checking, so it's `data[i]` vs. `if(i – tstenner Mar 16 '15 at 21:14
  • @Michael Also, it's "so much slower" only when compared to something extremely fast to begin with; the vector is *designed* around efficient random access. That `at` takes twice as much time only means that fetching the element length and doing the comparison takes about as much time as fetching another element - which is a sensible result. – user4815162342 Nov 01 '16 at 09:39
30

Why not write a test and find out?

Edit: My bad - I thought I was timing the optimised version but wasn't. On my machine, compiled with g++ -O2, the iterator version is slightly slower than the operator[] version, but probably not significantly so.

#include <vector>
#include <iostream>
#include <ctime>
using namespace std;

int main() {
    const int BIG = 20000000;
    vector <int> v;
    for ( int i = 0; i < BIG; i++ ) {
        v.push_back( i );
    }

    int now = time(0);
    cout << "start" << endl;
    int n = 0;
    for(vector<int>::iterator it = v.begin(); it != v.end(); ++it) {
        n += *it;
    }

    cout << time(0) - now << endl;
    now = time(0);
    for(size_t i = 0; i < v.size(); ++i) {
        n += v[i];
    }
    cout << time(0) - now << endl;

    return n != 0;
}
Jared Burrows
  • 54,294
  • 25
  • 151
  • 185
  • 3
    Did you test with full optimisation and try it with both the iterator version first and with the array version first? There may be a slight difference in performance but 2x? Not a chance. – James Hopkin Apr 22 '09 at 11:17
  • You'll get better performance measurements by using clock() rather than time(), or use whatever high-resolution timer your OS kernel provides. – Kristopher Johnson Apr 22 '09 at 12:18
  • I'm not really all thet interested. Higher resolution is a bit meaningless when you consider all the other activities that can be going on in a modern OS while your code runs. But feel free to edit my answer to incorportate your suggestions. –  Apr 22 '09 at 12:36
  • 5
    in my tests (using "time" shell builtin and all cout's disabled and one test commented out each time) both versions are equally fast (changed the code so it allocates in the constructor, each element has value "2"). actually the time changes in each test with around 10ms, which i suspect is because of the non-determinism of memory allocation. and sometimes the one, and sometimes the other test is 10ms faster than the other. – Johannes Schaub - litb Apr 22 '09 at 12:38
  • 1
    @litb - yes, I suspect the slight differences on my machine may be due to its lack of memory. I didn't mean to imply the difference was significant. –  Apr 22 '09 at 12:48
  • on x86 and similar platforms, *(a) requires the same cycles as executes *(a+b) at least for selected registers a and b. There are minor differences - e.g. instruction length and, sometimes, pairing, but generally they should run in the same. – peterchen Apr 22 '09 at 14:02
  • 4
    @anon: It's not about higher resolution. It's about using `clock()` rather than `time()` to explicitly ignore "all the other activities that can be gonig on in a modern OS while your code runs". `clock()` measures CPU time used for that process alone. – Lightness Races in Orbit Mar 05 '11 at 19:20
  • Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn) Target: x86_64-apple-darwin12. – David McKeone Sep 28 '13 at 07:22
  • 4
    You are instantiating the end iterator every time you loop, and iterator instantiation aren't free. Try caching your end iterator. Try this: `for(vector::iterator it = v.begin(), end= v.end(); it != end; ++it) { ... }` – mchiasson Mar 15 '15 at 12:41
19

Since you're looking at efficiency, you should realise that the following variations are potentially more efficient:

//1. Using vector<string>::iterator:

vector<string> vs = GetVector();
for(vector<string>::iterator it = vs.begin(), end = vs.end(); it != end; ++it)
{
   //...
}

//2. Using size_t index:

vector<string> vs = GetVector();
for(size_t i = 0, size = vs.size(); i != size; ++i)
{
   //...
}

since the end/size function is only called once rather than every time through the loop. It's likely that the compiler will inline these functions anyway, but this way makes sure.

James Hopkin
  • 13,797
  • 1
  • 42
  • 71
  • The question isn't about how to write efficient code, it is about iterators vs. indexes, but thanks for the input – Gal Goldman Apr 22 '09 at 12:55
  • 1
    Finally! the right answer on how to profile this correctly. – mchiasson Mar 15 '15 at 12:42
  • @GalGoldman Unfortunately, if you don't cache your end iterator, the iterator way has an unfair disadvantage over the `[]` way. Iterators are expensive to instantiate. This is also why I tend to use while loops instead of for loops when I use iterators. It forces me to cache my iterators. – mchiasson Mar 15 '15 at 12:45
  • 1
    @mchiasson Why does using a `while` loop 'force you to cache your iterators'? A naive way to use such a loop would be `auto it = vector.begin(); while ( it++ != vector.end() ) WatchMeNotCacheAnyIterators();` The problem remains: the onus is on the user not to write the slightly shorter, but potentially much less efficient, code. – underscore_d May 11 '17 at 18:06
  • 5
    @underscore_d true. I don't know what I was thinking 2 years ago lol. – mchiasson Jun 09 '17 at 01:49
18

If you don't need indexing, don't use it. The iterator concept is there for your best. Iterators are very easy to optimize, while direct access needs some extra knowledge.

Indexing is meant for direct access. The brackets and the at method do this. at will, unlike [], check for out of bounds indexing, so it will be slower.

The credo is: don't ask for what you don't need. Then the compiler won't charge you for what you don't use.

xtofl
  • 40,723
  • 12
  • 105
  • 192
6

As everyone else here is saying, do benchmarks.

Having said that, I would argue that the iterator is faster since at() does range checking as well, i.e. it throws an out_of_range exception if the index is out of bounds. That check itself propbably incurrs some overhead.

Mats Fredriksson
  • 19,783
  • 6
  • 37
  • 57
5

I would guess the first variant is faster.

But it's implementation dependent. To be sure you should profile your own code.

Why profile your own code?

Because these factors will all vary the results:

  • Which OS
  • Which compiler
  • Which implementation of STL was being used
  • Were optimizations turned on?
  • ... (other factors)
Brian R. Bondy
  • 339,232
  • 124
  • 596
  • 636
  • Also highly important: the surrounding code that the STL container accesses are being inlined into could favour one approach vs. another for some compilers and target platforms. (OS is least likely to matter, but target architecture may matter). Obviously optimizations need to be on for it to be worth discussing: un-optimized STL C++ is not worth considering. – Peter Cordes Apr 18 '16 at 16:50
  • I think your answer explains why it isn't enough to profile on my own machine, if it's code I will be redistributing -I need a sense of what it might do on the generic machine of a generic user, not what it does on mine. – Francesco Dondi Jun 08 '16 at 08:28
3

It really depends on what you are doing, but if you have to keep re-declaring the iterator, Iterators become MARGINALLY SLOWER. In my tests, the fastest possible iteration would be to declare a simple * to your vectors array and Iterate through that.

for example:

Vector Iteration and pulling two functions per pass.

vector<MyTpe> avector(128);
vector<MyTpe>::iterator B=avector.begin();
vector<MyTpe>::iterator E=avector.end()-1;
for(int i=0; i<1024; ++i){
 B=avector.begin();
   while(B!=E)
   {
       float t=B->GetVal(Val1,12,Val2); float h=B->GetVal(Val1,12,Val2);
    ++B;
  }}

Vector Took 90 clicks (0.090000 seconds)

But if you did it with pointers...

for(int i=0; i<1024; ++i){
MyTpe *P=&(avector[0]);
   for(int i=0; i<avector.size(); ++i)
   {
   float t=P->GetVal(Val1,12,Val2); float h=P->GetVal(Val1,12,Val2);
   }}

Vector Took 18 clicks (0.018000 Seconds)

Which is roughly equivalent to...

MyTpe Array[128];
for(int i=0; i<1024; ++i)
{
   for(int p=0; p<128; ++p){
    float t=Array[p].GetVal(Val1, 12, Val2); float h=Array[p].GetVal(Val2,12,Val2);
    }}

Array Took 15 clicks (0.015000 seconds).

If you eliminate the call to avector.size(), the time becomes the same.

Finally, calling with [ ]

for(int i=0; i<1024; ++i){
   for(int i=0; i<avector.size(); ++i){
   float t=avector[i].GetVal(Val1,12,Val2); float h=avector[i].GetVal(Val1,12,Val2);
   }}

Vector Took 33 clicks (0.033000 seconds)

Timed with clock()

adammonroe
  • 86
  • 1
3

It depends.

The answer is much more subtle than the existing answers show.

at is always slower than iterators or operator[].
But for operator[] vs. iterators, it depends on:

  1. How exactly you're using operator[].

  2. Whether your particular CPU has index registers (ESI/EDI on x86).

  3. How much other code also uses the same index passed to operator[].
    (e.g., are you indexing through multiple arrays in lockstep?)

Here's why:

  1. If you do something like

    std::vector<unsigned char> a, b;
    for (size_t i = 0; i < n; ++i)
    {
        a[13 * i] = b[37 * i];
    }
    

    Then this code will likely be much slower than the iterator version, since it performs a multiplication operation at each iteration of the loop!

    Similarly, if you do something like:

    struct T { unsigned char a[37]; };
    std::vector<T> a;
    for (size_t i = 0; i < n; ++i)
    {
        a[i] = foo(i);
    }
    

    Then this will probably also be slower than the iterator version, because sizeof(T) is not a power of 2, and therefore you are (again) multiplying by 37 each time you loop!

  2. If your CPU has index registers, then your code can perform as well or even better with indices rather than with iterators, if using the index register frees up another register for use in the loop. This is not something you can tell just by looking; you'd have to profile the code and/or disassemble it.

  3. If multiple arrays can share the same index, then the code only has to increment one index instead of incrementing multiple iterators, which reduces writes to memory and thus generally increases performance. However, if you're only iterating over a single array, then an iterator may very well be faster, since it avoids the need to add an offset to an existing base pointer.

In general, you should prefer iterators to indices, and indices to pointers, until and unless you face a bottleneck that profiling shows it will be beneficial to switch, because iterators are general-purpose and already likely to be the fastest approach; they don't require the data to be randomly-addressable, which allows you to swap containers if necessary. Indices are the next preferred tool, as they still don't require direct access to the data -- they are invalidated less frequently, and you can e.g. substitute a deque for a vector without any problems. Pointers should be a last resort, and they will only prove beneficial if iterators aren't already degenerating to potiners in release mode.

user541686
  • 205,094
  • 128
  • 528
  • 886
  • 1
    It's not index registers, it's indexed [addressing modes](http://stackoverflow.com/questions/34058101/referencing-the-contents-of-a-memory-location-x86-addressing-modes/34058400#34058400) like `[rax + rcx*4]` that lets the compiler increment one index instead of incrementing multiple pointers. It doesn't free up registers, though. You still need a register for every base pointer. If anything it will use an extra register. (A pointer-increment loop could spill an end pointer, and compare against it in memory for an end condition, instead of keeping a loop counter in a reg at all.) – Peter Cordes Apr 18 '16 at 16:54
  • 1
    re: multiply: compilers are smart enough to do the strength-reduction optimization. You should get an increment by 37 for either loop, instead of a multiply of the loop counter. On some CPUs, multiply is slow-ish. On modern Intel CPUs, `imul r32, r32, imm32` is 1 uop, 3c latency, one per 1c throughput. So it's quite cheap. gcc should probably stop breaking down multiplies by small constants into multiple `LEA` instructions if it takes more than one, esp. with `-mtune=haswell` or other recent Intel CPU. – Peter Cordes Apr 18 '16 at 17:01
2

You can use this test code and compare results! Dio it!

#include <vector> 
#include <iostream> 
#include <ctime> 
using namespace std;; 


struct AAA{
    int n;
    string str;
};
int main() { 
    const int BIG = 5000000; 
    vector <AAA> v; 
    for ( int i = 0; i < BIG; i++ ) { 
        AAA a = {i, "aaa"};
        v.push_back( a ); 
    } 

    clock_t now;
    cout << "start" << endl; 
    int n = 0; 
    now = clock(); 
    for(vector<AAA>::iterator it = v.begin(); it != v.end(); ++it) { 
        n += it->n; 
    } 
   cout << clock() - now << endl; 

    n = 0;
    now = clock(); 
    for(size_t i = 0; i < v.size(); ++i) { 
        n += v[i].n; 
    } 
    cout << clock() - now << endl; 

    getchar();
    return n != 0; 
} 
Mostaaf
  • 21
  • 1
  • 1
    Uhm … that’s not really all that different form Neil’s code. Why bother posting it? – Konrad Rudolph Feb 12 '10 at 10:11
  • 1
    You are instantiating the end iterator every time you loop, and iterator instantiation aren't free. Try caching your end iterator. Try this: `for(vector::iterator it = v.begin(), end= v.end(); it != end; ++it) { ... } ` – mchiasson Mar 15 '15 at 12:46
2

The first one will be faster in debug mode because index access creates iterators behind the scene, but in release mode where everything should be inlined, the difference should be negligible or null

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
Zorglub
  • 2,077
  • 1
  • 19
  • 22
  • 1
    `in debug mode [...] index access creates iterators behind the scene` That's going to be a gigantic [citation needed] from me. What stdlib implementation does this? Please link to the exact line of code. – underscore_d May 11 '17 at 18:09
1

I found this thread now when trying to optimize my OpenGL code and wanted to share my results even though the thread is old.

Background: I have 4 vectors, sizes ranging from 6 to 12. Write happens only once at the beginning of the code and read occurs for each of the elements in the vectors every 0.1 milliseconds

The following is the stripped down version of the code used first:

for(vector<T>::iterator it = someVector.begin(); it < someVector.end(); it++)
{
    T a = *it;

    // Various other operations
}

The frame rate using this method was about 7 frames per second (fps).

However, when I changed the code to the following, the frame rate almost doubled to 15fps.

for(size_t index = 0; index < someVector.size(); ++index)
{
    T a = someVector[index];

    // Various other operations
}
Karthik
  • 143
  • 2
  • 10
  • Have you tried pre-incrementing the iterator instead? Since post-inc requires an extra copy step this might have an influence. – Mike Lischke May 03 '13 at 08:26
  • 1
    You are instantiating the end iterator every time you loop, and iterator instantiation aren't free. Try caching your end iterator. Try this: `for(vector::iterator it = someVector.begin(), end = someVector.end(); it != end; ++it) { ... }` – mchiasson Mar 15 '15 at 12:47
  • Yeah, this is a totally unfair test, as the (nothing personal, but) naive and sloppy code means it artificially cripples the iterator case. – underscore_d May 11 '17 at 18:13
1

I think the only answer could be a test on your platform. Generally the only thing which is standardized in the STL is the type of iterators a collection offers and the complexity of algorithms.

I would say that there is no (not much of a difference) between those two versions- the only difference I could think of would be tjat the code has to iterate through the whole collection when it has to compute the length of an array (I'm not sure if the length is stored in a variable inside the vector, then the overhead wouldn't matter)

Accessing the elements with "at" should take a little bit longer than directly accessing it with [] because it checks if you are in the bounds of the vector and throws an exception if you are out of bounds (it seems [] is normally just using pointer arithmetic - so it should be faster)

bernhardrusch
  • 11,670
  • 12
  • 48
  • 59
1

If you are using VisualStudio 2005 or 2008, to get the best performance out of the vector you'll need to define _SECURE_SCL=0

By default _SECURE_SCL is on which makes iterating over a contain significantly slower. That said leave it on in debug builds, it will make tracking down any errors much easier. One word of caution, since the macro changes the size of iterators and containers, you'll have to be consistent across all compilation units that share a stl container.

Stephen Nutt
  • 3,258
  • 1
  • 21
  • 21
0

The difference should be negligible. std::vector guarantees that its elements are laid out consecutively in memory. Therefore, most stl implementations implement iterators into std::vector as a plain pointer. With this is mind, the only difference between the two versions should be that the first one increments a pointer, and in the second increments an index which is then added to a pointer. So my guess would be the second one is maybe one extremly fast (in terms of cycles) machine instruction more.

Try and check the machine code your compiler produces.

In general, however, the advice would be to profile if it really matters. Thinking about this kind of question prematurely usually does not give you too much back. Usually, your code's hotspots will be elsewhere where you might not suspect it at first sight.

Tobias
  • 6,388
  • 4
  • 39
  • 64
  • there is an noticeable overhead when instantiating iterators. Depends how many elements you're dealing with. As long as the iterators are cached, the cost should be minimal. I also recommend avoiding the iterator way when dealing with recursive functions for that reason. – mchiasson Mar 15 '15 at 12:54
0

Here's a code I wrote, compiled in Code::Blocks v12.11, using the default mingw compiler. This creates a huge vector, then accesses each element by using iterators, at(), and index. Each is looped once by calling the last element by function, and once by saving the last element to temporary memory.

Timing is done using GetTickCount.

#include <iostream>
#include <windows.h>
#include <vector>
using namespace std;

int main()
{
    cout << "~~ Vector access speed test ~~" << endl << endl;
    cout << "~ Initialization ~" << endl;
    long long t;
    int a;
    vector <int> test (0);
    for (int i = 0; i < 100000000; i++)
    {
        test.push_back(i);
    }
    cout << "~ Initialization complete ~" << endl << endl;


    cout << "     iterator test: ";
    t = GetTickCount();
    for (vector<int>::iterator it = test.begin(); it < test.end(); it++)
    {
        a = *it;
    }
    cout << GetTickCount() - t << endl;



    cout << "Optimised iterator: ";
    t=GetTickCount();
    vector<int>::iterator endofv = test.end();
    for (vector<int>::iterator it = test.begin(); it < endofv; it++)
    {
        a = *it;
    }
    cout << GetTickCount() - t << endl;



    cout << "                At: ";
    t=GetTickCount();
    for (int i = 0; i < test.size(); i++)
    {
        a = test.at(i);
    }
    cout << GetTickCount() - t << endl;



    cout << "      Optimised at: ";
    t = GetTickCount();
    int endof = test.size();
    for (int i = 0; i < endof; i++)
    {
        a = test.at(i);
    }
    cout << GetTickCount() - t << endl;



    cout << "             Index: ";
    t=GetTickCount();
    for (int i = 0; i < test.size(); i++)
    {
        a = test[i];
    }
    cout << GetTickCount() - t << endl;



    cout << "   Optimised Index: ";
    t = GetTickCount();
    int endofvec = test.size();
    for (int i = 0; i < endofvec; i++)
    {
        a = test[i];
    }
    cout << GetTickCount() - t << endl;

    cin.ignore();
}

Based on this, I personally got that "optimised" versions are faster than "non-optimised" Iterators are slower than vector.at() which is slower than direct indices.

I suggest you compile and run the code for yourselves.

EDIT: This code was written back when I had less experience with C/C++. A further test case should be to use prefix increment operators instead of postfix. That should better the running time.

ithenoob
  • 326
  • 1
  • 2
  • 11
0

Only slightly tangential to the original question, but the fastest loop would be

for( size_t i=size() ; i-- ; ) { ... }

which would of course count down. This does give a substantial saving if you have a large number of iterations in your loop, but it contains only a small number of very fast operations.

So with the [] operator access, this might be faster than many of the examples already posted.

  • Without benchmarks, and probably even after that, this is just persistent myth based on vague ideas about machine code. Counting down is not necessarily faster all these decades later, and/or compilers can optimise things like this better than coders in any case. (And this comes from me, who often _does_ count down, out of reflex. I don't claim it matters, though.) If only we were all still targeting Z80s, where this would be relevant! – underscore_d May 11 '17 at 18:16
  • Wrong, wrong wrong, this is *not* "just a persistent myth" based on vague ideas about machine code. How dare you sir ! indeed I have benchmarked this, counting down in this way, because of the combination of the decrement and evaluation in a single step results in fewer machine instructions - look at the assembled code and it is faster. In my original posting I mentioned you only see a sizable difference if you have a large number of elements, and the content of the loop is extremely lightweight. If the loop is large, the overhead of counting up or down becomes insignificant. – jam spandex May 12 '17 at 19:04
  • There's very little we could do in a loop where the difference would matter. And even that idea of a difference assumes folk writing equivalent loops, but which count up, don't get the optimisation free from the compiler anyway if they compile with decent optimisations. What was the body of the loop, & which optimisation settings did you use, where this gave "a substantial saving"? But anyway, ultimately my point is this kind of thing is rarely worth worrying about, & if we're going to tell folk to spend time altering how they code, there are many much more productive things they could look at – underscore_d May 17 '17 at 20:53
  • So you concede this is *not* a myth. I agree that aggressive optimisation renders such differences mostly irrelevant and will most likely end up producing the same code - a case in point is "use postfix rather than prefix" suggested by ithenoob - this *is* a myth: every compiler I have ever used generates the exact same machine instructions for both cases if the return value is not used, even with *no* optimisation. I was quite clear that the actual looping will only matter if the loop body is very light. Everyone else seemed to ignore this fact and your now updated point just seems to agree – jam spandex Aug 06 '17 at 00:35