How large is the branch prediction buffer for a typical modern CPU?

Question

The application that I'm dealing with has a large number of a if-statements with the characteristics that in any one execution, only one of the branches is executed 90% of the time.

Now, I can test the impact of branch prediction on a single if-statement for a specific CPU by doing something like this :-

#include <iostream>
#include <stdlib.h>

using namespace std;

int main() {
  int a;
  cin>>a;
  srand(a);
  int b;

  long count=0;

  for (int i=0; i<10000; i++) {
    for (int j=0; j<65535; j++) {
      b = rand() % 30 + 1;
      if (b > 15) // This can be changed to get statistics for different %-ages
        count += (b+10);
    }
  }

  cout << count <<"\n";
}

My question is, is there a way of testing the scalability and impact of branch prediction with multiple if-statements in an actual large application for a given CPU?

Basically, I want to be able to figure out how much branch mispredicts are costing on various CPUs and their impact on the application.

don't know about amd's line, but intel processors have a series of debug registers that should keep stats on things like branch prediction. if you can get into them, it'd be a simple matter to get the "total branches" v.s. "total failed/succeeded predictions" counts. — Marc B, Sep 10 '12 at 16:06
Why does the question title ask about the size of the branch prediction buffer? Your question body is about something completely different: how to get information on the branch mispredict penalty you incur in your code. Decide on *one* thing to ask, and then update both question title and body to reflect *that* question :) — jalf, Sep 10 '12 at 16:32
All modern CPUs *do* have a branch prediction buffer. The problem is that what they *use* it for (or rather, how they use it) varies, and the surrounding branch prediction logic varies. So simply looking at the size of that buffer tells you virtually nothing. The branch predictor uses many different heuristics to improve its success rate. — jalf, Sep 10 '12 at 16:33
I think that the closest you can get to answering your question is by reading "The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers" that can be found on http://www.agner.org/optimize/. To get rid of those nasty if-chains I'd suggest you look into implementing some kind of JIT compiler. — CAFxX, Sep 10 '12 at 17:36
This is especially interesting given the findings in http://stackoverflow.com/q/11227809/53897 — Thorbjørn Ravn Andersen, Aug 24 '13 at 19:26

score 4 · Answer 1 · answered Sep 10 '12 at 18:04

You need to take into a account the complexity of your branches, the compiler may remove branches using architecture specific operation codes like CMOV (compare and move).

Your simple example code

if (b > 15)
    count += (b+10);

Here's the code compiled into machine language

;; assembly x86 FASM/NASM syntax

;; WITH branching
MOV ebx, [b] ;; b
MOV ecx, [count] ;; count
CMP ebx, 15 ;; if condition to set flags
JLE .skip ;; { branch/jump over the if body when less than or equal
LEA eax, [ecx + ebx + 10] ;; count + b+10
MOV [count], eax ;; store count
.skip: ;; } label after the if block

;; WITHOUT branching
MOV ebx, [b] ;; b
MOV ecx, [count] ;; count
LEA eax, [ecx + ebx + 10] ;; pre-calc avoiding the need to branch
CMP ebx, 15 ;; if condition to set flags
CMOVLE eax, ecx ;; make eax equal to ecx (current count) when less than or equal
            ;; avoiding the branch/jump
MOV [count], eax ;; store count

So unless you know how your optimizing compiler is optimizing your code, it's a little difficult to profile branch prediction. If you are checking your machine code output and know you have a lot of J[condition] statements then using the code profiling tool mentioned in the comments are sufficient. Trying to roll your own branch prediction test without using the proper architecture debugging registers will lead to the situation I demonstrated above.

How large is the branch prediction buffer for a typical modern CPU?

1 Answers1