1

Is it faster to do something like

for ( int * pa(arr), * pb(arr+n); pa != pb; ++pa )
{ 
   // do something with *pa
}

than

for ( size_t k = 0; k < n; ++k )
{ 
   // do something with arr[k]
}

???

I understand that arr[k] is equivalent to *(arr+k), but in the first method you are using the current pointer which has incremented by 1, while in the second case you are using a pointer which is incremented from arr by successively larger numbers. Maybe hardware has special ways of incrementing by 1 and so the first method is faster? Or not? Just curious. Hope my question makes sense.

3 Answers3

2

If the compiler is smart enought (and most of compilers is) then performance of both loops should be ~equal.

For example I have compiled the code in gcc 5.1.0 with generating assembly:

int __attribute__ ((noinline)) compute1(int* arr, int n)
{
  int sum = 0;
  for(int i = 0; i < n; ++i)
  {
    sum += arr[i];
  }
  return sum;
}

int __attribute__ ((noinline)) compute2(int* arr, int n)
{
  int sum = 0;
  for(int * pa(arr), * pb(arr+n); pa != pb; ++pa)
  {
    sum += *pa;
  }
  return sum;
}

And the result assembly is:

compute1(int*, int):
    testl   %esi, %esi
    jle .L4
    leal    -1(%rsi), %eax
    leaq    4(%rdi,%rax,4), %rdx
    xorl    %eax, %eax
.L3:
    addl    (%rdi), %eax
    addq    $4, %rdi
    cmpq    %rdx, %rdi
    jne .L3
    rep ret
.L4:
    xorl    %eax, %eax
    ret
compute2(int*, int):
    movslq  %esi, %rsi
    xorl    %eax, %eax
    leaq    (%rdi,%rsi,4), %rdx
    cmpq    %rdx, %rdi
    je  .L10
.L9:
    addl    (%rdi), %eax
    addq    $4, %rdi
    cmpq    %rdi, %rdx
    jne .L9
    rep ret
.L10:
    rep ret
main:
    xorl    %eax, %eax
    ret

As you can see, the most heavy part (loop) of both functions is equal:

.L9:
    addl    (%rdi), %eax
    addq    $4, %rdi
    cmpq    %rdi, %rdx
    jne .L9
    rep ret

But in more complex examples or in other compiler the results might be different. So you should test it and measure, but most of compilers generate similar code.

The full code sample: https://goo.gl/mpqSS0

AdamF
  • 2,501
  • 17
  • 30
0

This cannot be answered. It depends on your compiler AND on your machine.

A very naive compiler would translate the code as is to machine code. Most machines indeed provide an increment operation that is very fast. They normally also provide relative addressing for an address with an offset. This could take a few cycles more than absolute addressing. So, yes, the version with pointers could potentially be faster.

But take into account that every machine is different AND that compilers are allowed to optimize as long as the observable behavior of your program doesn't change. Given that, I would suggest a reasonable compiler will create code from both versions that doesn't differ in performance.

0

Any reasonable compiler will generate code that is identical inside the loop for these two choices - I looked at the code generated for iterating over a std::vector, using for-loop with an integer for the iterator or using a for( auto i: vec) type construct [std::vector internally has two pointers for the begin and end of the stored values, so like your pa and pb]. Both gcc and clang generates identical code inside the loop itself [the exact details of the loop is subtly different between the compilers, but other than that, there's no difference]. The setup of the loop was subtly different, but unless you OFTEN do loops of less than 5 items [and if so, why do you worry?], the actual content of the loop is what matters, not the bit just before the actual loop.

As with ALL code where performance is important, the exact code, compiler make and version, compiler options, processor make and model, will make a difference to how the code performs. But for the vast majority of processors and compilers, I'd expect no measurable difference. If the code is really critical, measure different alternatives and see what works best in your case.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227