Performance: Pointer dereferencing in C++

Question

Every single time I loop through an array inside a class, which I access via a pointer, I ask myself this very same question:

Does each iteration produce overhead by dereferencing of the pointer? Do dereference-chains add up? For example:

ClassA *a = new ClassA();
ClassB *b = new ClassB();

for( int i = 0; i < 10; i++){

   a->b->array[i].foo();

}

If I had to guess, I would say this involves 20 dereference steps, one for each pointers, 10 iterations. But I could just as well imagine, that its reduced to 10 because the chained pointers are translated to a single one by the compiler. I could even imagine, that it is reduced to 1, because of some caching-voodoo or something.

Can someone tell, and maybe explain to me, how this behaves performance-wise? Would really appreciate it!

Btw, I know that kind of similar questions have already been answered here, but I was not able to deduce the answer for this specific topic. So please dont blame me for bringing up this topic again.

Is b a member of A? What does the globally scoped b instance of ClassB have anything to do with the unknown object b that is the member of A? — franji1, Aug 04 '17 at 02:39
"*If I had to guess, I would say this involves 20 dereference steps*" - 30, actually. `a->`, `b->`, and `array[i]` are all pointer dereferences, so you have 3 dereferences per loop iteration, for 10 iterations. — Remy Lebeau, Aug 04 '17 at 04:31

score 6 · Accepted Answer · answered Aug 04 '17 at 02:43

It's really up to the compiler (and in particular, the optimizer) how it wants to generate code. Under the as-if rule, as long as the user can't tell the difference in how the program behaves externally, the compiler can do whatever it wants, and modern compilers can get quite clever regarding the optimizations they apply.

In practice, I think the most modern optimizers will be unable to optimize the loop only if they can't tell what happens inside foo() -- in particular, if they can't guarantee that foo()'s implementation will not change the values of a or b, then they will be forced to generate code that does a separate dereference of a and b for every loop iteration, just to make sure that the right thing happens even if a or b's values change.

You can find out for yourself what happens if you don't mind reading a bit of assembly code -- just compile the program to assembly, with optimization enabled (e.g. g++ -O3 -S mytest.cpp) and read the resulting mytest.S file to see what the compiler did. Try it with foo() implemented in the same file (so that the compiler can definitely see foo()'s contents) and with foo implemented in a different file (so that the compiler may have to treat foo() as a 'black box') and see what difference that makes.

Thanks a lot, I didnt even consider the fact, that foo() could change one of the pointers, but thats actually a quite interesting thing to consider, when guessing what the compiler is gonna do. I have to admit, that I never read a single line of assembly before, but I guess if I am interested in optimizing, this is the right time to start :D — user3808217, Aug 04 '17 at 02:53

score 3 · Answer 2 · answered Aug 04 '17 at 02:50

3

You can be sure to get rid of some dereferencing by doing things like this:

// create a pointer to the b class outside of the loop
ClassB * bptr = a->b;        

// use the pointer inside the loop
for( int i = 0; i < 10; i++){

    bptr->array[i].foo();

}

answered Aug 04 '17 at 02:50

ttemple

852
5
20

3

And take it further by saving a pointer to `array` before entering the loop, and have the loop increment that pointer on each iteration instead of using the index operator. What was once 30 dereferences is reduced to 10, instead of 20 as in your example. – Remy Lebeau Aug 04 '17 at 04:29

score 1 · Answer 3 · answered Aug 04 '17 at 04:51

1

I would expect 1 memory access, because a and a->b don't change within the loop, so there is no need to fetch them again. Also all the values of i are known for a->b->array[i], so it can be prefetched.

answered Aug 04 '17 at 04:51

akintayo

176
2
7

Performance: Pointer dereferencing in C++

3 Answers3