The question you are asking is about branch target prediction, but this doesn’t come into play in this particular example. The only difference here is whether the branch is taken or not.
The code in this case boils down to:
if (data[c] >= 128)
sum += data;
In a loop, so in assembly it is going to look something like:
cmp rsi, 128
jl loop ;this is the one being predicted
add eax, rsi
j loop
In the jump less than, the branch predictor has to make a guess as to whether to go to the start of the loop again or fall through. If it gets it right, the code will run faster because the instruction queue will be well fed without bubbles or stalls in the pipeline.
Following the weather forecast principle (tomorrow’s weather will be much the same as today’s) if the CPU can remember which way it went last time, it will be faster. You can think of this as being a single Boolean value that is stored in the implementation of the CPU which records whether or it it went that way the last time, and guessing the same again. After each jump is taken (or not) the value is updated for next time.
If you have a pattern of (eg) aaaaAAAA
and you’re iterating through on whether it is a capital or not, if you use this simple prediction then you’ll be right all of the time except for the aA
transition. If the data is aAaAaAaA
then you will be wrong all the time.
So the pattern of the data causes the taken/not taken record to be either close to what happens or what doesn’t happen. In this case, if you sort the data then all the small values will be at the start of the data, and so the branch predictor will have good success in predicting correctly. If you have random data, then the branch predictor will be less useful.
Modern branch predictors are much more complicated; so this is a simplistic view. But if you think of the branch prediction state being a single Boolean of taken/not taken held in a memory cell as part of the branch predictor logic, it might make sense.