You don't make it clear why you are focusing on this, whether you have already done some performance analysis and homed in on this routine or whether you are doing "blind mans optimization" - which consists of looking at the code and saying "maybe that's slow".
Let me first address your up-front question:
a->b->c[i]
vs
f[i]
If you compile these two pieces of code without optimization then there is a very high probability that f[i]
will be faster.
As soon as you enable optimization all bets are off. Firstly, which architecture you are using is unknown, so the cost of the sequential fetches in a->b->c
is unknown, also we don't know how many registers are available, or what optimizations the compiler might use. It is conceivable that the cost of any single write might be high enough that, if the CPU uses pipelining, the writes take long enough to make it irrelevant whether we spend some time doing pointer math between writes or not.
As a somewhat experienced optimizer, I'd be more interested in "what does value()
do?". Can the compiler be certain that value()
does not modify any of the values in a, a-> or a->b->c?
If you absolutely, definitively know that these values won't change, have done perf analysis and found that this loop is a bottleneck, looked at the assembler to determine that the compiler does not emit the most efficient code, then you might optimize it as follows:
int function_a(struct b* const b)
{
/// optimization: we found XYC compiler for Leg architecture was
/// emitting instructions that repeatedly fetched the array base
/// address every iteration.
float* const end = f + 500;
for (float* it = b->c->d; it < end; ++it)
*it = value();
}
HOWEVER: Making such a low-level optimization carries a risk. C/C++ optimizers these days can be pretty smart. One way to keep them from generating the most efficient code possible is to start hand-optimizing things.
What we've done here is made an efficiently tight loop, but that may not be the most efficient way to achieve the result in assembly.
In the i = 0; i < 500
case, depending on the implementation of value()
, may actually stride or vectorize the loop in such a way as to keep the memory bus busy, or it might use special, wide registers to do multiple operations at a time. Our optimization may create a pathelogical scenario whereby we force the compiler to emit the least efficient order of operations.
Again - we don't know the reasons you are focusing on this part of the code, but in practice I've always found it is very unlikely you will gain much by hand-optimizing this part of a loop.
If you are developing under Linux, you may want to look into valgrind to assist you with perf analysis. If you are developing under Visual Studio then "Analyze" -> "Performance and Diagnostics" (ctrl-alt-f9) will bring up the perf wizard. Click start and select "Instrumentation".