5

Given 2 functions, which should be faster, if there is any difference at all? Assume that the input data is very large

void iterate1(const char* pIn, int Size)
{
   for ( int offset = 0; offset < Size; ++offset )
   {
      doSomething( pIn[offset] );
   }
}

vs

void iterate2(const char* pIn, int Size)
{
   const char* pEnd = pIn+Size;
   while(pIn != pEnd)
   {
      doSomething( *pIn++ );
   }
}

Are there other issues to be considered with either approach?

Snazzer
  • 7,704
  • 5
  • 27
  • 25

11 Answers11

10

Chances are, your compiler's optimizer will create a loop induction variable for the first case to turn it into the second. I'd expect no difference after optimizations so I tend to prefer the first style because I find it clearer to read.

Boojum
  • 6,592
  • 1
  • 30
  • 34
  • Yes - pointing out that fine-level implementation is a compiler detail/concern. (Although I find the secondary style more readable :-) –  Nov 02 '09 at 20:23
8

Boojum is correct - IF your compiler has a good optimizer and you have it enabled. If that's not the case, or your use of arrays isn't sequential and liable to optimization, using array offsets can be far, far slower.

Here's an example. Back about 1988, we were implementing a window with a simple teletype interface on a Mac II. This consisted of 24 lines of 80 characters. When you got a new line in from the ticker, you scrolled up the top 23 lines and displayed the new one on the bottom. When there was something on the teletype, which wasn't all the time, it came in at 300 baud, which with the serial protocol overhead was about 30 characters per second. So we're not talking something that should have taxed a 16 MHz 68020 at all!

But the guy who wrote this did it like:

char screen[24][80];

and used 2-D array offsets to scroll the characters like this:

int i, j;
for (i = 0; i < 23; i++)
  for (j = 0; j < 80; j++)
    screen[i][j] = screen[i+1][j];

Six windows like this brought the machine to its knees!

Why? Because compilers were stupid in those days, so in machine language, every instance of the inner loop assignment, screen[i][j] = screen[i+1][j], looked kind of like this (Ax and Dx are CPU registers);

Fetch the base address of screen from memory into the A1 register
Fetch i from stack memory into the D1 register
Multiply D1 by a constant 80
Fetch j from stack memory and add it to D1
Add D1 to A1
Fetch the base address of screen from memory into the A2 register
Fetch i from stack memory into the D1 register
Add 1 to D1
Multiply D1 by a constant 80
Fetch j from stack memory and add it to D1
Add D1 to A2
Fetch the value from the memory address pointed to by A2 into D1
Store the value in D1 into the memory address pointed to by A1

So we're talking 13 machine language instructions for each of the 23x80=1840 inner loop iterations, for a total of 23920 instructions, including 3680 CPU-intensive integer multiplies.

We made a few changes to the C source code, so then it looked like this:

int i, j;
register char *a, *b;
for (i = 0; i < 22; i++)
{
  a = screen[i];
  b = screen[i+1];
  for (j = 0; j < 80; j++)
    *a++ = *b++;
}

There are still two machine-language multiplies, but they're in the outer loop, so there are only 46 integer multiplies instead of 3680. And the inner loop *a++ = *b++ statement only consisted of two machine-language operations.

Fetch the value from the memory address pointed to by A2 into D1, and post-increment A2
Store the value in D1 into the memory address pointed to by A1, and post-increment A1.

Given there are 1840 inner loop iterations, that's a total of 3680 CPU-cheap instructions - 6.5 times fewer - and NO integer multiplies. After this, instead of dying at six teletype windows, we never were able to pull up enough to bog the machine down - we ran out of teletype data sources first. And there are ways to optimize this much, much further, as well.

Now, modern compilers will do that kind of optimization for you - IF you ask them to do it, and IF your code is structured in a way that permits it.

But there are still circumstances where compilers can't do that for you - for instance, if you're doing non-sequential operations in the array.

So I've found it's served me well to use pointers instead of array references whenever possible. The performance is certainly never worse, and frequently much, much better.

Bob Murphy
  • 5,814
  • 2
  • 32
  • 35
  • Wouldn't it be better to wait for the profiler to tell you the loop was where the slowdown was occurring? – Steve Rowe Nov 03 '09 at 00:21
  • 1
    @Steve Rowe: Profilers aren't always available. There weren't any on the Mac in 1987, and there aren't really any good ones for Linux on ARM processors today. Also, adeptness with pointers is really the first step on the road from being a C/C++ beginner to a master. If you can't write pointer-based code during interviews, many places in Silicon Valley won't hire you. And if you can't read it, you'll be hopelessly lost in code including the Linux and Mac OS kernels, the X server, and many device drivers. – Bob Murphy Nov 03 '09 at 06:13
  • 2
    @Steve Rowe: Also, I can say from experience: any time you have loops nested two or more deep, the innermost code is likely to be at least a potential performance bottleneck. – Bob Murphy Nov 03 '09 at 06:16
3

With modern compiler there shouldn't be any difference in performance between the two, especially in such simplistic easily recognizable examples. Moreover, even if the compiler does not recognize their equivalence, i.e. translates each code "literally", there still shouldn't be any noticeable performance difference on a typical modern hardware platform. (Of course, there might be more specialized platforms out there where the difference might be noticeable.)

As for other considerations... Conceptually, when you implement an algorithm using the index access you impose a random-access requirement on the underlying data structure. When you use a pointer ("iterator") access, you only impose a sequential-access requirement on the underlying data structure. Random-access is a stronger requirement than sequential-access. For this reason I, for one, in my code prefer to stick to pointer access whenever possible, and use index access only when necessary.

More generally, if an algorithm can be implemented efficiently through sequential access, it is better to do it that way, without involving the unnecessary stronger requirement of random-access. This might prove useful in the future, should a need arise to refactor the code or to change the algorithm.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
2

They are almost identical. Both solutions involve a temporary variable, an increment of a word on your system (int or ptr), and a logical check which should take one assembly instruction.

The only difference I see is the array lookup

arr[idx]

might require pointer arithmetic then a fetch while the dereference:

*ptr

just requires a fetch

My advice is that if it really matters, implement both and see if there's any savings.

Doug T.
  • 64,223
  • 27
  • 138
  • 202
2

To be sure, you must profile in your intended target environment.

That said, my guess is that any modern compiler is going to optimize them both down to very similar (if not identical) code.

If you didn't have an optimizer, the second has a chance of being faster, because you aren't re-computing the pointer on every iteration. But unless Size is a VERY large number (or the routine is called quite often), the difference isn't going to matter to your program's overall execution speed.

Michael Kohne
  • 11,888
  • 3
  • 47
  • 79
  • 1
    Intel has instructions that do offset-to-pointer directly, so you're not recomputing the pointer explicitly. I've seen the first loop run faster. – Mark Ransom Nov 02 '09 at 20:49
2

Several years ago I asked this exact question. Someone in an interview was failing a candidate for picking the array notation because it was supposedly obviously slower. At that point I compiled both versions and looked at the disassembly. There was one opcode extra in the array notation. This was with Visual C++ (.net?). Based on what I saw I concluded that there is no appreciable difference.

Doing this again, here is what I found:

    iterate1(arr, 400); // array notation
011C1027  mov         edi,dword ptr [__imp__printf (11C20A0h)] 
011C102D  add         esp,0Ch 
011C1030  xor         esi,esi 
011C1032  movsx       ecx,byte ptr [esp+esi+8] <-- Loop starts here
011C1037  push        ecx  
011C1038  push        offset string "%c" (11C20F4h) 
011C103D  call        edi  
011C103F  inc         esi  
011C1040  add         esp,8 
011C1043  cmp         esi,190h 
011C1049  jl          main+32h (11C1032h) 

    iterate2(arr, 400); // pointer offset notation
011C104B  lea         esi,[esp+8] 
011C104F  nop              
011C1050  movsx       edx,byte ptr [esi] <-- Loop starts here
011C1053  push        edx  
011C1054  push        offset string "%c" (11C20F4h) 
011C1059  call        edi  
011C105B  inc         esi  
011C105C  lea         eax,[esp+1A0h] 
011C1063  add         esp,8 
011C1066  cmp         esi,eax 
011C1068  jne         main+50h (11C1050h) 
Steve Rowe
  • 19,411
  • 9
  • 51
  • 82
  • 3
    I would fail a candidate (well maybe not fail, but discount) for going with the pointer offset for valuing microoptimization over clarity. – JohnMcG Nov 02 '09 at 19:28
  • @John, that's my thinking as well. If I were to prefer one, it would be the clearer code. Doing anything else is premature optimization. – Steve Rowe Nov 02 '09 at 20:35
  • 1
    @JohnMcG: I would fail a candidate that believed that the only reason for using a "sliding pointer" technique is microoptimization. It is not. There are other compelling reasons to prefer a pointer over an index. – AnT stands with Russia Nov 02 '09 at 21:31
  • @Doug, these days that can be a serious problem. – Steve Rowe Nov 04 '09 at 15:49
2

The pointer op used to be much faster. Now it's a bit faster, but the compiler may optimize it for you

Historically it was much faster to iterate via *p++ than p[i]; that was part of the motivation for having pointers in the language.

Plus, p[i] often requires a slower multiply op or at least a shift, so the optimization of replacing multiplies in a loop with adds to a pointer was sufficiently important to have a specific name: strength reduction. The subscript also tended to produce bigger code.

However, two things have changed: one is that compilers are much more sophisticated and are generally capable of doing this optimization for you.

The other is that the relative difference between an op and a memory access has increased. When *p++ was invented memory and cpu op times were similar. Today, a random desktop machine can do 3 billion integer ops / second, but only about 10 or 20 million random DRAM reads. Cache accesses are faster, and the system will prefetch and stream sequential memory accesses as you step through an array, but it still costs a lot to hit memory, and a bit of subscript fiddling isn't such a big deal.

DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
1

Why don't you try both and time them? My guess would be that they are optimized by the compiler into basically the same code. Just remember to turn on optimizations when comparing (-O3).

Emil H
  • 39,840
  • 10
  • 78
  • 97
  • If performance really matters to you, test the code you're considering. There are too many variables to rely on the advice of strangers on the internet. – Mark Ransom Nov 02 '09 at 20:53
1

In the "other considerations" column, I'd say approach one is more clear. That's just my opinion though.

Jordan
  • 1,599
  • 4
  • 26
  • 42
1

You're asking the wrong question. Should a developer aim for readability or performance first?

The first version is idiomatic for processing array, and your intent will be clear to anyone who has worked with arrays before, whereas the second relies heavily on the equivalence between array names and pointers, forcing someone reading the code to switch metaphors several times.

Cue the comments saying that the second version is crystal clear to any developer worth his keybaord.

If you wrote your program, and it's running slow, and you have profiled to the point where you have identified this loop as the bottleneck, then it would make sense to pop the hood and look at which of these is faster. But get something clear up and running first using well-known idiomatic language constructs.

Community
  • 1
  • 1
JohnMcG
  • 8,709
  • 6
  • 42
  • 49
  • These days, my rule of thumb is that if it's easier for a human to understand then it's also easier for a compiler to understand. – Boojum Nov 02 '09 at 20:26
1

Performance questions aside, it strikes me that the while loop variant has potential maintainability issues, as a programmer coming along to add some new bells and whistles has to remember to put the array increment in the right place, whereas the for loop variant puts it safely out of the body of the loop.

ceo
  • 1,138
  • 1
  • 8
  • 16