6

Some people said: "Any operation that can be achieved by array subscripting can also be done with pointers. The pointer version will in general be faster".

I doubt about the outcome of the above, so I do the following test:

In the following article, We do not care compiler optimization. About compiler optimization how to affect the efficiency between pointer and array, please note: Efficiency: arrays vs pointers

(Visual Studio 2010, Debug Mode, no optimizations)

#include <windows.h>
#include <stdio.h>

int main()
{
    int a[] = {10,20,30};
    int* ap = a;

    long counter;

    int start_time, end_time;
    int index;

    start_time = GetTickCount();
    for (counter = 1000000000L; counter>0; counter--)
    {
        *(ap+1) = 100;
    }
    end_time = GetTickCount();
    printf("10 billion times of *ap = %d\n", end_time-start_time);

    start_time = GetTickCount();
    for (counter = 1000000000L; counter>0; counter--)
    {
        a[1] = 101;
    }
    end_time = GetTickCount();
    printf("10 billion times of a[0] = %d\n", end_time-start_time);

    return 0;
}

the result is:

10 billion times of *ap = 3276
10 billion times of a[0] = 3541

The pointer seems to be a little fast. But after I compared the dis-assemble, I fell into a deeper confusion。

(Visual Studio 2010, Debug Mode, no optimizations)

; 17   :         *(ap+1) = 100;
mov eax, DWORD PTR _ap$[ebp]
mov DWORD PTR [eax+4], 100          ; 00000064H

; 25   :         a[1] = 101;
mov DWORD PTR _a$[ebp+4], 101       ; 00000065H

From the assemble output, memory access through a pointer takes 2 instructions and array only takes 1 instruction.

Why array execute less instructions but it doesn't takes less time than pointer?

Does it associated with the cpu cache? How can I modify my test code to prove it?

Community
  • 1
  • 1
ajaxhe
  • 571
  • 1
  • 5
  • 13
  • 6
    never optimize in Debug mode... – TemplateRex Apr 05 '13 at 13:25
  • 2
    Many years ago I tested exactly this. The array indexing version was faster. The statement is rubbish, either could be faster, it just depends. – john Apr 05 '13 at 13:25
  • It just depends... on hardware? – Philip Apr 05 '13 at 13:26
  • 11
    It's not useful to profile on a Debug build. – Roger Rowland Apr 05 '13 at 13:27
  • 7
    `Some people said: "Any operation that can be achieved by array subscripting can also be done with pointers. The pointer version will in general be faster".` These people were very silly. – Lightness Races in Orbit Apr 05 '13 at 13:29
  • 2
    Note that `a[1]` is exactly equivalent to `*(a + 1)`. Subscripting arrays *is* just pointer arithmetic. – Joseph Mansfield Apr 05 '13 at 13:33
  • 1
    265 ticks difference over 10 *billion* iterations? That's so far down in the noise it isn't worth worrying about. The performance is essentially identical. – John Bode Apr 05 '13 at 14:39
  • @john From assemble output, array execute less instructions than pointer mode, Why the execution time of the former is not the half of the latter? – ajaxhe Apr 06 '13 at 00:38
  • 1
    @ajaxhe: Not all instructions take the same number of cycles to execute. – John Bode Apr 06 '13 at 00:41
  • @JohnBode What is the exactly mean about "Not all instructions take the same number of cycles to execute.", sorry for my poor assembly background. – ajaxhe Apr 06 '13 at 01:11
  • @ajaxhe : [This document](http://cs.smith.edu/~thiebaut/ArtOfAssembly/CH03/CH03-3.html#HEADING3-125) gives a pretty good explanation. – John Bode Apr 06 '13 at 01:45
  • @JohnBode I got some ideas from the link that you provided. Both the cache and pipeline can affect the execute time. – ajaxhe Apr 06 '13 at 08:15

1 Answers1

2

Firstly and most importantly, the C language doesn't have speed. That's an attribute introduced by implementations of C. For example, C doesn't have speed, but the GCC compiler produces code that might vary in speed from the code produced by Clang compiler, and they're both likely to produce code that out-performs the behavior produced by the Cint or Ch interpreters. All of these are C implementations. Some of them are slower than others, but the speed can't be attributed to C in anyway!

6.3.2.1 of the C standard says:

Except when it is the operand of the sizeof operator, the _Alignof operator, or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue.

This ought to be an indication that both *(ap+1) and a[1] in your code are pointer operations. This translation will happen during the compilation phase in Visual Studio. Hence, this shouldn't have any impact on runtime.

6.5.2.1 in regards to "array subscripting" says:

One of the expressions shall have type ‘‘pointer to complete object type’’, the other expression shall have integer type, and the result has type ‘‘type’’. This seems to indicate that the array subscript operator is actually a pointer operator...

This is confirmation that ap[1] is indeed a pointer operation, as we postulated earlier. At runtime, however, the array has already been translated to a pointer. The performance should be identical.

... so, why aren't they identical?

What are the characteristics of the OS you're using? Isn't it a multitasking, multi-user OS? Suppose the OS were to complete the first loop without interruption, but then interrupt the second loop and switch control to a different process. Wouldn't this interruption invalidate your experiment? How do you measure the frequency and timing of interruptions caused by task switching? Note that this will differ for different OSes, and the OS is part of the implementation.

What are the characteristics of the CPU you're using? Does it have it's own fast, internal cache for machine code? Suppose your entire first loop, and it's encompassing timing mechanism were to fit into the code cache nicely, but the second loop were truncated. Wouldn't this result in a cache miss, and a long wait while your CPU fetches the remainder of the code from RAM? How do you measure the timing of interruptions caused by cache misses? Note that this will differ for different CPUs, and the CPU is part of the implementation.

These questions ought to raise some questions such as "Does this micro-optimisation benchmark solve a meaningful or important problem?". The success of an optimisation will differ depending upon the size and complexity of the problem. Find an important problem, solve it, profile the solution, optimise it and profile it again. That way, you can give meaningful information about how much faster the optimised version is. Your boss will be much happier with you, providing you don't divulge that the optimisations are probably only relevant for your implementation, as I mentioned earlier. I'm certain you'll find that the least of your worries will be array dereference vs pointer dereference.

autistic
  • 1
  • 3
  • 35
  • 80
  • @GrijeshChauhan Thanks. I appreciate the formatting, but the other edits were either incorrect or insignificant. I meant "[cint](http://root.cern.ch/drupal/content/cint)". There are two correct ways to spell "behavior": [The American-English way](http://www.merriam-webster.com/dictionary/behavior), and [the British way](http://www.merriam-webster.com/dictionary/behaviour). Please, don't go out of your way to correct that. You'll end up with arthritis quicker than you realise... – autistic Apr 05 '13 at 15:49
  • Ooh..Sorry for that..It was by mistake..BTW nice explanation So I took interest. – Grijesh Chauhan Apr 05 '13 at 15:52
  • @lvalue The different outcome maybe affected by cpu cache, but how can prove it? – ajaxhe Apr 06 '13 at 01:08
  • @ajaxhe You could swap your code around so the pointer test runs first... but you've missed the point: There is no point to this form of micro-optimisation. Read through the entire answer with that point in mind, and you might understand my answer better. Solve an important problem, profile your solution and you'll find you're targeting one of the least effective optimisations. – autistic Apr 06 '13 at 03:02
  • Yes, you are right. There is no point to this form of micro-optimisation. CPU cache and pipeline should be also considered in implementation. Thanks! – ajaxhe Apr 06 '13 at 08:19