I'm writing some logging C code for an ARM9 processor. This code will record some data if a dynamic module is present. The module will usually not be present in a production build, but the logging code will always be compiled in. The idea is that if a customer encounters a bug, we can load this module, and the logging code will dump debugging information.
The logging code must have minimal impact when the module is not present, so every cycle counts. In general, the logging code looks something like this:
__inline void log_some_stuff(Provider *pProvider, other args go here...)
{
if (NULL == pProvider)
return;
... logging code goes here ...
}
With optimization on, RVCT 4.0 generates code that looks like this:
ldr r4,[r0,#0x2C] ; pProvider,[r0,#44]
cmp r4,#0x0 ; pProvider,#0
beq 0x23BB4BE (usually taken)
... logging code goes here...
... regular code starts at 0x23BB4BE
This processor has no branch predictor, and my understanding is that there is a 2 cycle penalty whenever a branch is taken (no penalty if the branch is not taken).
I would like the common case, where NULL == pProvider
, to be the fast case, where the branch is not taken. How can I make RVCT 4.0 generate code like this?
I've tried using __builtin_expect
as follows:
if (__builtin_expect(NULL == pProvider, 1))
return;
Unfortunately, this has no impact on the generated code. Am I using __builtin_expect
incorrectly? Is there another method (hopefully without inline assembly)?