For branch prediction, the BHT(Branch history table) is indexed by branch virtual address. Aliasing problem happens when two or more branches hash to the same entry in the BHT(Branch history table), and this confliction results in bad prediction accuracy.
The BTB(branch target buffer) is part of the branch history info. So What is the BTB eviction scheme used in modern CPUs (compared to cache eviction)? Will a taken branch always takes a BTB entry immediately and evict an entry if no free in the set?
Actuall, I had did some tests as below.
First, I defined two functions which are N taken forward jmps.
flush_btb:
0x0000000000001439 <+0>: endbr64
0x000000000000143d <+4>: push %rbp
0x000000000000143e <+5>: mov %rsp,%rbp
0x0000000000001441 <+8>: mov %edi,-0x4(%rbp)
0x0000000000001444 <+11>: jmp 0x1446 <flush_btb+13>
0x0000000000001446 <+13>: jmp 0x1448 <flush_btb+15>
0x0000000000001448 <+15>: jmp 0x144a <flush_btb+17>
0x000000000000144a <+17>: jmp 0x144c <flush_btb+19>
<repeat 32K times>
test_branches:
0x00000000000115cb <+0>: endbr64
0x00000000000115cf <+4>: push %rbp
0x00000000000115d0 <+5>: mov %rsp,%rbp
0x00000000000115d3 <+8>: mov %edi,-0x4(%rbp)
0x00000000000115d6 <+11>: jmp 0x115d8 <test_branches+13>
0x00000000000115d8 <+13>: jmp 0x115da <test_branches+15>
0x00000000000115da <+15>: jmp 0x115dc <test_branches+17>
0x00000000000115dc <+17>: jmp 0x115de <test_branches+19>
<repeat 4K times>
Then I run below code on a i7-7700 CPU (Kaby Lake):
test_branches() # Expect the branches enter BTB.
flush_btb() # Expect most of previous BTB entries are evicted.
test_branches() # Then measure branch-miss event for this function.
I expect the count of branch-miss should be high. But it isn't true.
Branch instructions: 4106
Branch misses: 5 (0.12%)
It seems that on this CPU the prediction for jmp doesn't need BTB info.
I also tried on a i9-12900h CPU (Alder Lake). The flush_btb does have effect and the branch-miss event increases to hundreds compared to no flush_btb.
As mentioned by @PeterCordes, the BR_MISP_RETIRED.ALL_BRANCHES may not count non-indirect jmps. Then, I changed the 'jmp' to 'jne'. The result is different. The branch-miss rate varies from 10% to 83%.
Branch instructions: 4107
Branch misses: 419 (10.20%)
Branch instructions: 4107
Branch misses: 2404 (58.53%)
Branch instructions: 4107
Branch misses: 3216 (78.31%)
Branch instructions: 4107
Branch misses: 471 (11.47%)
Branch instructions: 4107
Branch misses: 3438 (83.71%)
If I remove the call to flush_btb(), the branch-miss rate varies from 6% to 64%.
Branch instructions: 4107
Branch misses: 278 (6.77%)
Branch instructions: 4107
Branch misses: 819 (19.94%)
Branch instructions: 4107
Branch misses: 968 (23.57%)
Branch instructions: 4107
Branch misses: 2631 (64.06%)
Branch instructions: 4108
Branch misses: 1005 (24.46%)