1

In the question Why is it faster to process a sorted array than an unsorted array? the answer had to do with branch predicting being bad with random order data and good with sorted data. BUT the thing is the data is the same, just the ordering was different.

That confused me, I pretty much assumed the prediction is compile in. Its either always assume true or always assume false but apparently its dynamic. Since its dynamic how does it know which to predict? Is there data on it in the cache somewhere? I heard of code execution cache and ram cache, is there data in the cpu cache? is it only L1?

How exactly does branch prediction adjust its prediction and where is its data to make that dynamic decision its doing?

Community
  • 1
  • 1
  • You didn't read the original post carefully enough. Branch prediction works in CPU's because most of the time *the software takes one branch more often the other*, so the CPU just "guesses" that the more well-travelled branch is the correct one (which it will be, most of the time). Branch prediction on an unsorted array fails because the branches are equally favored. – Robert Harvey Oct 11 '12 at 19:22
  • 2
    The details depend on the particular CPU, but modern high end CPUs typically have quite sophisticated branch predictors which keep tables of recorded branch data and use these for future branch prediction: http://en.wikipedia.org/wiki/Branch_predictor – Paul R Oct 11 '12 at 19:24
  • @RobertHarvey I'm asking how does it remember which is the more well travelled branch –  Oct 11 '12 at 19:27
  • What can we add that the Wikipedia article doesn't adequately cover? – Robert Harvey Oct 11 '12 at 19:29
  • @RobertHarvey: I was reading the wiki, you didnt give me a chance to reply. What could be added is how does Next line prediction [Branch target predictor](http://en.wikipedia.org/wiki/Branch_target_predictor) work. It mentions i7 uses 'Next line prediction' and that its a target predictor but doesn't explain how it predicts (nor does the target predictor wiki page). It also says it has a 2nd predictor with no explanation. A lot of predictions look like it was 80s and 90s hardware and embeded cpus. –  Oct 11 '12 at 19:47
  • One say the memory is on instruction set but no other mentions where. Another said 10%. 10% of what? It doesnt mention if prediction is loss if it leaves L1 cache and another mentions contention which is confusing. It mentions share the same table but doesnt define the table –  Oct 11 '12 at 19:49
  • Don't confuse "branch predictors" with "branch target predictors". They are two different things. I assume you're asking about the first? – Mysticial Oct 11 '12 at 20:01

1 Answers1

0

The question you are asking is about branch target prediction, but this doesn’t come into play in this particular example. The only difference here is whether the branch is taken or not.

The code in this case boils down to:

if (data[c] >= 128)
  sum += data;

In a loop, so in assembly it is going to look something like:

cmp rsi, 128
jl loop ;this is the one being predicted
add eax, rsi
j loop

In the jump less than, the branch predictor has to make a guess as to whether to go to the start of the loop again or fall through. If it gets it right, the code will run faster because the instruction queue will be well fed without bubbles or stalls in the pipeline.

Following the weather forecast principle (tomorrow’s weather will be much the same as today’s) if the CPU can remember which way it went last time, it will be faster. You can think of this as being a single Boolean value that is stored in the implementation of the CPU which records whether or it it went that way the last time, and guessing the same again. After each jump is taken (or not) the value is updated for next time.

If you have a pattern of (eg) aaaaAAAA and you’re iterating through on whether it is a capital or not, if you use this simple prediction then you’ll be right all of the time except for the aA transition. If the data is aAaAaAaA then you will be wrong all the time.

So the pattern of the data causes the taken/not taken record to be either close to what happens or what doesn’t happen. In this case, if you sort the data then all the small values will be at the start of the data, and so the branch predictor will have good success in predicting correctly. If you have random data, then the branch predictor will be less useful.

Modern branch predictors are much more complicated; so this is a simplistic view. But if you think of the branch prediction state being a single Boolean of taken/not taken held in a memory cell as part of the branch predictor logic, it might make sense.

AlBlue
  • 23,254
  • 14
  • 71
  • 91