5

If there's function that takes a pointer to struct as parameter and the function has a loop that access a member at every iteration like:

int function_a(struct b *b)
{
    // ... do something then
    for(int i = 0; i < 500; ++i){
        b->c->d[i] = value();
    }
}

Is it going to retrieve the location c points to and d points to from memory every time?

Now consider the following situation:

int function_a(struct b *b)
{
    // ... do something then
    float *f = b->c->d;
    for(int i = 0; i < 500; ++i){
        f[i] = value();
    }
}

Would that be faster?

2013Asker
  • 2,008
  • 3
  • 25
  • 36
  • Optimizers will do automatically what you did manually, when they can tell it is safe to do so. If you know it is safe, do the optimization. There are other things that could screw this up, like `volatile` qualifiers, but they're rare. – Jonathan Leffler Dec 31 '13 at 19:18
  • Depends on how smart the compiler is. It should be able to see that b->c->d (the pointer) is constant, but then again your compiler may be dumb. – IdeaHat Dec 31 '13 at 19:19
  • 6
    The performance of the loop is likely to be dominated by the performance of `value()`. The rest is probably premature optimization unless you've measured that it is a performance inhibitor. – Jonathan Leffler Dec 31 '13 at 19:26
  • The second option would be faster, don't rely on theoretical optimizations when you can be sure about it. – 0xFADE Dec 31 '13 at 19:27
  • @Jonathan Leffler: That depends on what value() is. It could be a simple getter that gets inlined easily. – user2345215 Dec 31 '13 at 19:28
  • 2
    @JonathanLeffler +1 for mentioning measuring. Its (almost) always better to measure in these cases then to try and speculate on what optimizations the compiler can figure out. – IdeaHat Dec 31 '13 at 19:28
  • @MadScienceDreams: It primarily depends on what `value()` does... And whether the compiler knows what it does. Alias analysis is notoriously hard, so this sort of micro-optimization is often beneficial. Also I would argue the latter form is easier to read. – Nemo Dec 31 '13 at 19:29
  • 1
    Also, just from the point of view of the maintainer, I prefer seeing a local variable with a proper name than a long chain of derreferences... `b->c->d[i]` might be easier to read as `meaningfulName[i]`. – David Rodríguez - dribeas Dec 31 '13 at 19:31
  • @DavidRodríguez-dribeas: [The Law of Demeter](http://en.wikipedia.org/wiki/Law_of_Demeter) might be applicable, or worth thinking about. I agree that the long chain is less desirable. – Jonathan Leffler Dec 31 '13 at 19:33
  • 1
    @MadScienceDreams, while written about a different language, the message Shawn Hargreaves gives in his blog is a good one: [Why Measure When You Can Guess?](http://blogs.msdn.com/b/shawnhar/archive/2009/07/06/why-measure-when-you-can-guess.aspx) – Brian S Dec 31 '13 at 19:34
  • 1
    If you can make the pointer constant, that might also help the optimizer figure things out. Also, for what it's worth, you don't need to repeat the 'struct' keyword when declaring a variable of that type (you do in C, not in C++). It also terrifies me that your argument name is the same as the type - but I'll assume the original source is different. – Mark Dec 31 '13 at 20:01
  • @Mark: Making the pointer `const` will definitely not help the optimizer figure anything out. (Declaring pointer-to-const, were appropriate, is certainly good style... But it has nothing to do with optimization, ever.) See http://stackoverflow.com/questions/6313730/. – Nemo Dec 31 '13 at 23:18
  • @Nemo - intriguing. Thanks! I wonder whether the `immutable` keyword in the D language provides a boon then, as that can guarantee constness from all viewpoints. (something for me to ponder) – Mark Dec 31 '13 at 23:22
  • 1
    @Mark: Possibly; I do not know D :-). Combining `const` with `restrict` on a pointer actually does what you want... But `restrict` is only available in C, not C++ (except as a non-portable extension). And `const` _objects_ actually are immutable; `const Foo x = ...;` is very different from `const Foo *x = ...;` for optimization purposes. – Nemo Jan 01 '14 at 02:58

3 Answers3

7

I urge you to heed Thomas Matthews' advice regarding profiling, however to answer your question: it depends.

This particular transformation is also known as code hoisting which consists in moving code without side-effect and with the same result at each call outside of a loop. As noted though, this is only performed if the compiler can prove:

  • that there is no side-effect
  • that the same result is computed at each call

In both cases, this basically means that the compiler should have access to the full code (see the definitions) of both:

  • the expression itself, to prove the absence of side-effect
  • anything that might alter something about the expression, to prove that the same result is computed each time

Therefore, it is actually unlikely that it will perform the optimization unless all the code for the body loop is included in the headers (and thus can be inlined) because any opaque function could hide a modification of b->c (for example) through an evil global variable.

In your example, nothing proves that value() does not change b->c... so no, the compiler would be wrong to hoist the code unless it has access to the definition of value() and can rule this possibility out.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
4

Using a temporary may not be faster than accessing a temporary: depends on the platform.

When in doubt, look at the assembly language generated by the compiler. On the ARM processor, when accessing the memory:

  • A register is loaded with the address of the variable.
  • The register is dereferenced to obtain the value (and stored in another register).

This is very similar to dereferencing a pointer:

  • A register is loaded with the pointer value.
  • The register is dereferenced to obtain the value.

There may be a second load from memory to fetch the pointer value. The truth is in the assembly language.

This is known as a micro-optimization and should only be applied as a last resort to speed up code in performance critical areas. Use a profiler to find out where the bottlenecks are and address those first.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • 3
    I think this answer misses the core of the question, the question is whether `b->c->d` could have an impact over `p` where `p` is a pointer holding the value of `b->c->d`. In both cases we are dealing with a pointer, not with a value. The question is whether the code explicitly precomputes the pointer to use before the loop or on each iteration. Then again, the answer has been accepted, so I might have misunderstood the question :) – David Rodríguez - dribeas Dec 31 '13 at 19:33
  • David Rodriguez is correct, the question is about a multiple hop pointer sered as opposed to accessing a single pointer directly. – kfsone Dec 31 '13 at 19:56
  • The only way to know is to look at the assembly language. One could speculate that loading the pointers `b, c, d` into register variables before the loop may increase performance. The compiler may already do this on higher levels of optimization. But then, what is the performance gain, nanoseconds, microseconds, milliseconds? All of which may be negligible when the program performs I/O or waits for a GUI event. – Thomas Matthews Dec 31 '13 at 19:58
  • Or one could read the original question. `b->c->d[i]` vs `f[i]`. So the answer part of your answer is not related to the question (since it ignores indirection), only the caution at the end is pertinent and that should have been a comment rather than an answer. – kfsone Dec 31 '13 at 22:03
0

You don't make it clear why you are focusing on this, whether you have already done some performance analysis and homed in on this routine or whether you are doing "blind mans optimization" - which consists of looking at the code and saying "maybe that's slow".

Let me first address your up-front question:

 a->b->c[i]

vs

 f[i]

If you compile these two pieces of code without optimization then there is a very high probability that f[i] will be faster.

As soon as you enable optimization all bets are off. Firstly, which architecture you are using is unknown, so the cost of the sequential fetches in a->b->c is unknown, also we don't know how many registers are available, or what optimizations the compiler might use. It is conceivable that the cost of any single write might be high enough that, if the CPU uses pipelining, the writes take long enough to make it irrelevant whether we spend some time doing pointer math between writes or not.

As a somewhat experienced optimizer, I'd be more interested in "what does value() do?". Can the compiler be certain that value() does not modify any of the values in a, a-> or a->b->c?

If you absolutely, definitively know that these values won't change, have done perf analysis and found that this loop is a bottleneck, looked at the assembler to determine that the compiler does not emit the most efficient code, then you might optimize it as follows:

int function_a(struct b* const b)
{
    /// optimization: we found XYC compiler for Leg architecture was
    /// emitting instructions that repeatedly fetched the array base
    /// address every iteration.
    float* const end = f + 500;
    for (float* it = b->c->d; it < end; ++it)
        *it = value();
}

HOWEVER: Making such a low-level optimization carries a risk. C/C++ optimizers these days can be pretty smart. One way to keep them from generating the most efficient code possible is to start hand-optimizing things.

What we've done here is made an efficiently tight loop, but that may not be the most efficient way to achieve the result in assembly.

In the i = 0; i < 500 case, depending on the implementation of value(), may actually stride or vectorize the loop in such a way as to keep the memory bus busy, or it might use special, wide registers to do multiple operations at a time. Our optimization may create a pathelogical scenario whereby we force the compiler to emit the least efficient order of operations.

Again - we don't know the reasons you are focusing on this part of the code, but in practice I've always found it is very unlikely you will gain much by hand-optimizing this part of a loop.

If you are developing under Linux, you may want to look into valgrind to assist you with perf analysis. If you are developing under Visual Studio then "Analyze" -> "Performance and Diagnostics" (ctrl-alt-f9) will bring up the perf wizard. Click start and select "Instrumentation".

shaedrich
  • 5,457
  • 3
  • 26
  • 42
kfsone
  • 23,617
  • 2
  • 42
  • 74