I've reduced my code down to the following, which is as simple as I could make it whilst retaining the compiler output that interests me.
void foo(const uint64_t used)
{
uint64_t ar[100];
for(int i = 0; i < 100; ++i)
{
ar[i] = some_global_array[i];
}
const uint64_t mask = ar[0];
if((used & mask) != 0)
{
return;
}
bar(ar); // Not inlined
}
Using VC10 with /O2 and /Ob1, the generated assembly pretty much reflects the order of instructions in the above C++ code. Since the local array ar
is only passed to bar()
when the condition fails, and is otherwise unused, I would have expected the compiler to optimize to something like the following.
if((used & some_global_array[0]) != 0)
{
return;
}
// Now do the copying to ar and call bar(ar)...
Is the compiler not doing this because it's simply too hard for it to identify such optimizations in the general case? Or is it following some strict rule that forbids it from doing so? If so, why, and is there some way I can give it a hint that doing so wouldn't change the semantics of my program?
Note: obviously it would be trivial to obtain the optimized output by just rearranging the code, but I'm interested in why the compiler won't optimize in such cases, not how to do so in this (intentionally simplified) case.