Pentium-4 has branch hints (using segment prefixes), but they tended not to be useful in practice so later CPUs silently ignore them. (Intel x86 0x2E/0x3E Prefix Branch Prediction actually used?) This was x86's only experiment with hardware branch hints. (Some other ISAs have also had them and may still support them).
See also Is it possible to tell the branch predictor how likely it is to follow the branch?, including my answer on that question. You can hint the compiler how to lay out if bodies, e.g. so the common case is fall-through instead of taken, which does help. That's what __builtin_expect
is for, or the C++20 [[likely]]
and [[unlikely]]
attribute.
But those compiler hints do not interact with CPU hardware branch prediction. (Except in the sense that they may convince the compiler to use branchy code instead of branchless, so there is a branch that needs predicting at all, instead of a cmov
or something for cases where branchless was possible.)
If you have a special iteration with known branching, you can peel it and optimize it specially before entering the actual loop.
Or if your branch is always taken for the first part of a loop, then always not-taken, you can split your loop into two halves: one with the always-true case the other optimized for the always-false case.
Or if the branch direction is totally loop-invariant, then only one of the loop versions ever needs to run. That's kind of a loop-versioning optimization. Optimizing a branch inside a loop into 2 separate loops costs code-size, but can be a win if it runs frequently, especially if you can optimize the now-unconditional code into the surrounding code.
Your case appears to be totally loop-invariant after the first iteration, so peel that first iteration to use the initial v
(from the function arg), then later iterations can infinite-loop or do a single iteration.
Not really much to gain here; hoisting the p[5]
load and using a compare-and-branch on it at the bottom of the loop would appear to do the trick.
Your p
isn't modified in the loop, and p[5]
is read-only. So after the first iteration, v = p[5]
is loop-invariant. (Assuming this is C-like pseudocode (without ;
) and thus data-race undefined behaviour can be assumed not to happen, i.e. no async modifications to p
.)
len
, sum
, and memberVar
aren't declared, but unless one is a C++ reference to the same int
as p[5]
there's no possible aliasing.