Is 2-bit prediction always better than 1-bit?

Question

Does 2-bit prediction always better than 1-bit? And from wikipedia, how ‘a loop-closing conditional jump is mispredicted once rather than twice.’ with 2-bit prediction?

According to this answer, 2-bit prediction will have 1 misprediction if with always taken.Then according to this one (see context of 'first and last loop iterations' in this link), 1-bit will also have 1 misprediction which is same as 2-bit prediction if with always taken?

Better if one concrete example with some explanation.

Peter Cordes · Answer 1 · 2023-05-24T03:50:30.553

When a loop exits, a 1-bit counter "learns" that the loop branch is not-taken, so will mispredict the loop branch on the first iteration next time it runs. (Assuming the predictor entry hasn't been overwritten by something else before the loop runs again, e.g. if we're talking about an inner loop inside an outer loop.)

For example, the pattern for an inner loop branch might be TTTTTN TTTTTN TTTN .... A 2-bit counter will predict T every time (once it learns the pattern), mispredicting every N but only those. A 1-bit predictor will predict the previous direction, and ...TN T... has two changes, resulting in two mispredicts per outer-loop iteration.

Wikipedia is correct about 1 and 2-bit predictors, and so are the stack overflow answer about 2-bit counters and the article cs.umd.edu article you linked. They all agree that 1-bit predictors will have 2 mispredicts per loop execution. From that article:

Even in simple iterative loops like this, the 1-bit BHT will cause 2 mispredictions (first and last loop iterations): first time through the loop, when it predicts exit instead of looping, and the end of the loop case, when it exits instead of looping as before

IDK where you're seeing it say that a 1-bit predictor will only have one mispredict.

Presumably you could construct a pathological case where 1-bit does less badly than 2-bit, but I assume any such case would require hard-to-predict patterns that would do badly with both.

Most cases of hard-to-predict branching patterns are at least as good with 2-bit as 1-bit, I think.

Considering aliasing, if two different branches alias the same predictor entry, if one branch is "hot" and strongly taken or strongly not-taken, with a 2-bit predictor, one infrequent execution of an unrelated branch won't shift the prediction away from the correct value before the hot branch runs once again. But a 1-bit predictor will change its prediction.

Thanks, with the context that you say 'on the first iteration next time it runs', I understand the problem. And answering 'where you're seeing it', '1-bit predictor will only have one mispredict' is based on the condition that it take ‘T’ when entering loop, which is wrong when entering the loop the second time. So the 'always taken' or 'always not taken' is only concerned when the program begin, then program use previous program state buffer and prediction buffer. Do my comment have something wrong? — zg c, May 27 '23 at 08:56
@zgc: Yeah, the you're ignoring the mispredict on loop exit, where both 2-bit and 1-bit mispredict. That's where the one mispredict across all loop iterations comes from for a 2-bit predictor. (One mispredict across all N iterations, for an N-iteration loop). If you don't count that for 1-bit but do for 2-bit, then you incorrectly say that both have one mispredict. I think you now understand where the mispredicts will happen, but I'm not sure if you're arguing that your original description of each having "one mispredict" was correct. It's not, for the implied context of "across all iters" — Peter Cordes, May 27 '23 at 09:10
1. Maybe I saied too much in the first comment without explicit delimiter like number which gives misunderstanding. 2. And my description of each having "one mispredict" was wrong after using your context. More detailed see first comment "it take ‘T’ when entering loop" where I only considered the first loop when predictor take 'always taken' strategy if no history prediction buffer offered. — zg c, May 27 '23 at 09:37
Both answers are well. Since the 'Erik Eidt' commentted too much, so take it as solution. — zg c, May 31 '23 at 08:52

Erik Eidt · Accepted Answer · 2023-05-24T14:26:38.683

1

Does 2-bit prediction always better than 1-bit?

We can construct a pathological case for either predictor, and they are not the same case, thus one will be better than the other in some case.

The pathological case for the 1-bit predictor is simple alternation, TNTNTNTNT... The 1-bit predictor will mispredict all branches (save perhaps the first) — whereas the 2-bit predictor will predict some of these correctly, an improvement over no correct predictions.

The pathological case for the 2-bit predictor is double alternation, TTNNTTNNTTNN... The 2-bit predictor mispredict all branches (save perhaps the first) — whereas the 1-bit predictor will predict some of these correctly — again, an improvement over no correct predictions.

How do we construct such cases? Using if-then-else inside a larger loop, where the branch of interest we focus on is the one deciding the then vs. else case, and with a data set / workload that involves this alternation.

(If that construct is itself in an outer loop, then save for the first execution of the inner loop, the if-then-else will have some initial history.)

According to this answer, 2-bit prediction will have 1 misprediction if with always taken.

That answer glosses over the first potential mispredict from the first execution of the inner loop inside the first iteration of the outer loop. It rather (reasonably correctly) points out that the second and beyond iteration of the inner loop (inside the outer loop), there is history to predict the branch taken, so there should be no question of a miss on the first iteration of the inner loop (in the second and beyond iteration of the outer loop).

(That answer also transforms both loops from for-loop to do-while-loop — which don't have the same semantics in the general case, though this is a valid to do because the loops are counted and we know by the counts they will each iterate at least once (this transformation is an efficiency improvement — there are other such transformations for the more general case that involve initially jumping to an exit condition at the end, or, repeating the exit condition test outside the loop followed by the do-while form)).

Then according to this one (see context of 'first and last loop iterations' in this link), 1-bit will also have 1 misprediction which is same as 2-bit prediction if with always taken?

The first + last adds up to 2 mispredictions per execution of the loop, assuming repeated execution of the loop as a whole, and that the loop itself iterates rather than does not iterate. Assuming other things being equal, the 2-bit predictor will mispredict only last, but not first (in the context of repeated iteration of the loop as a whole).

edited May 24 '23 at 14:26

answered May 24 '23 at 14:00

Erik Eidt

23,049
2
29
53

The `for` to `do{}while()` transformation is totally standard even when we don't know the trip-count is always non-zero. In that case there'd be a *separate* branch ahead of the loop, with its own prediction. (Never taken as long as the trip counts seen in practice are always non-zero.) So the actual loop branch will be at the bottom even with `for( int i=0 ; i < n ; i++)` for runtime-variable `n`. (As in [this Q&A](https://stackoverflow.com/q/47783926)). So IMO it's appropriate to gloss over this when talking about loops, although fine to mention this detail. – Peter Cordes May 24 '23 at 14:07
1. Could you speak more detailedly with simple codes about 'then save for the first execution of the inner loop, the if-then-else will have some initial history.' although I have understand your elegant examples 'TNTNTNTNT' and 'TTNNTTNNTTNN' 2. Is you answer '(That answer also transforms ... ))' all have more detailed infos in [Agner's doc](https://www.agner.org/optimize/optimizing_assembly.pdf)? If so, Could you offer the specific chapter or page infos. – zg c May 27 '23 at 09:29
First iteration is a logical concept, where first execution is a more physical concept. For any code, let's say a function, it may have been run previously to completion, say, returned to its caller, and now is invoked and running again. If the first invocation and second invocation are close enough together, then the second invocation will be in the history cache. – Erik Eidt May 27 '23 at 14:06
Now if we take that to nested loops, then imagine the outer loop is on its second iteration.. Looking back, that means the outer loop, did run its first iteration, and thus executed the inner loop so now all the branches, both inner & outer loop, are thus known to the history cache. Getting back to the first iteration of the inner loop within the second iteration of the outer loop, all branches of both loops are in the cache, which is not not necessarily the case with the first iteration of the inner loop within the first iteration of the outer loop. – Erik Eidt May 27 '23 at 14:11
So, hopefully you see that the concepts of having run/executed before and being on/at first iteration are separate. First iteration may or may not have history in the cache, whereas the second iteration surely does due to the first iteration. First iteration may have history in the cache if the whole loop ran previously and is still in the cache. – Erik Eidt May 27 '23 at 14:13
I am not familiar with his text. There are many loop optimizations. The obvious and naïve for- and while- loop have a condition test at the beginning involving a conditional branch (that tests when to exit the loop by skipping ahead past the whole loop to the next logical statement after the loop), and then also an unconditional branch at the bottom of the loop body to repeat the loop, for next iteration starting with the exit test. This pattern has at least 2 branches per loop, one conditional at the top and one unconditional at the bottom. – Erik Eidt May 27 '23 at 14:22
By rearranging the code, the unconditional branch can always be removed, lessening the number of branches per iteration. The simplest approach is to transform the while-style loop into a do-while-style loop. Semantically, though these are not logical equivalents, due to the fact that a while-loop iterates 0 or more times, whereas a do-while loop iterates 1 or more times. – Erik Eidt May 27 '23 at 14:28
If the loop is known to iterate >=1 time (which is generally the case when constant start and stop indexes are specified, e.g. for i = 1 to 50) then the simple transformation is valid, since the semantic difference won't show itself. – Erik Eidt May 27 '23 at 14:29
When the loop iteration count could be zero, i.e. we cannot tell at compile time and need a dynamic/runtime check, we can still do the transformation, but have to add a guard of some sort that exits the transformation without iterating at all loop if this time the loop goes zero iterations. – Erik Eidt May 27 '23 at 14:32
That guard can take two forms: one is to use an if-then surrounding the entire do-while loop transformation (the do-while is nested within the then-part of the if-then), where the if-then condition is the same as the while-loop condition; it simply tests whether the do-while should run at all. The if-then, being outside the loop, adds no overhead to the loop itself. – Erik Eidt May 27 '23 at 14:34
The other form requires the use of a goto in high level language, very unstructured but perfectly fine in assembly. It involves placing a goto before the do-while loop, and this goto jumps into the loop (considered bad form in HLL, as against structured programming), skipping the body of the loop and going right to the while clause of the do-while loop. Thus, the while condition test is executed before the first iteration, mitigating the semantic differences. The goto, also being outside and before the loop, adds no overhead to the loop itself. – Erik Eidt May 27 '23 at 14:37
Thanks for your comment ['So, hopefully you see'](https://stackoverflow.com/questions/76319923/is-2-bit-prediction-always-better-than-1-bit#comment134629560_76324375) and ['By rearranging the code'](https://stackoverflow.com/questions/76319923/is-2-bit-prediction-always-better-than-1-bit#comment134629659_76324375) based on their context – zg c May 31 '23 at 08:39
1- Your last (which is the original [answer](https://stackoverflow.com/questions/47783926/why-are-loops-always-compiled-into-do-while-style-tail-jump) case) and Second to last comment supplemented the your " 'By rearranging the code' " comment. Thanks for such detailed comments. 2-Although I expect one simple code as [my comment one](https://stackoverflow.com/questions/76319923/is-2-bit-prediction-always-better-than-1-bit#comment134627720_76324375) , I have understanded what you mean now. 3- Maybe I asked one duplicate question in my comment one since Peter has said 'As in this Q&A' – zg c May 31 '23 at 08:47

Is 2-bit prediction always better than 1-bit?

2 Answers2