The branch-loads and branch-load-misses events are equal to branch-instructions and branch-misses events.
They are just two events of type PERF_TYPE_HW_CACHE, which is another abstraction for hardware cache events. The BPU is cache-like.
$ strace -e perf_event_open perf stat -e branch-loads,branch-load-misses ls
perf_event_open({type=PERF_TYPE_HW_CACHE, size=0x88 /* PERF_ATTR_SIZE_??? */, config=PERF_COUNT_HW_CACHE_BPU|PERF_COUNT_HW_CACHE_OP_READ<<8|PERF_COUNT_HW_CACHE_RESULT_ACCESS<<16, ...}, 2512745, -1, -1, PERF_FLAG_FD_CLOEXEC) = 4
perf_event_open({type=PERF_TYPE_HW_CACHE, size=0x88 /* PERF_ATTR_SIZE_??? */, config=PERF_COUNT_HW_CACHE_BPU|PERF_COUNT_HW_CACHE_OP_READ<<8|PERF_COUNT_HW_CACHE_RESULT_MISS<<16, ...}, 2512745, -1, -1, PERF_FLAG_FD_CLOEXEC) = 5
And they are finaly mapped to hw events BR_INST_RETIRED.ALL_BRANCHES and BR_MISP_RETIRED.ALL_BRANCHES.
static __initconst const u64 skl_hw_cache_event_ids
[PERF_COUNT_HW_CACHE_MAX]
[PERF_COUNT_HW_CACHE_OP_MAX]
[PERF_COUNT_HW_CACHE_RESULT_MAX] =
{
...
[ C(BPU ) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0xc4, /* BR_INST_RETIRED.ALL_BRANCHES */
[ C(RESULT_MISS) ] = 0xc5, /* BR_MISP_RETIRED.ALL_BRANCHES */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
[ C(OP_PREFETCH) ] = {
[ C(RESULT_ACCESS) ] = -1,
[ C(RESULT_MISS) ] = -1,
},
},
...
}
More mappings on intel skylake:
- cache-misses: LONGEST_LAT_CACHE.MISS
- cache-references: LONGEST_LAT_CACHE.REFERENCE
- branch-loads: BR_INST_RETIRED.ALL_BRANCHES
- branch-load-misses: BR_MISP_RETIRED.ALL_BRANCHES
- L1-dcache-loads: MEM_INST_RETIRED.ALL_LOADS
- L1-dcache-load-misses: L1D.REPLACEMENT
- L1-dcache-stores: MEM_INST_RETIRED.ALL_STORES
- L1-icache-load-misses: ICACHE_64B.IFTAG_MISS
- LLC-loads: OFFCORE_RESPONSE
- LLC-load-misses: OFFCORE_RESPONSE
- LLC-stores: OFFCORE_RESPONSE
- LLC-store-misses: OFFCORE_RESPONSE
- dTLB-loads: MEM_INST_RETIRED.ALL_LOADS
- dTLB-load-misses: DTLB_LOAD_MISSES.WALK_COMPLETED
- dTLB-stores: MEM_INST_RETIRED.ALL_STORES
- dTLB-store-misses: DTLB_STORE_MISSES.WALK_COMPLETED
- iTLB-loads: ITLB_MISSES.STLB_HIT
- iTLB-load-misses: ITLB_MISSES.WALK_COMPLETED