How to disable branch prediction C++/Mac/Intel

Question

I am working on an assignment for school. Essentially, we are analyzing sorting algorithms and their costs on large sets of numbers. We have a best case (in order already), worst case (reverse order), and average case (random order). However, for almost all of my sorting algorithms, sorting the worst case takes less time than the average case. After reading, it definitely seems like branch prediction is causing this. It is recognizing the pattern (decreasing order) and executing the code quicker than it should be in theory (big O notation).

I've done some research on branch prediction, and while there appears to be ways to optimize it to be faster, I can't find anything on disabling it entirely. Is there a G++ flag I can use? Or a terminal command?

This is an example of my bubble sort algorithm:

void bubble(vector<long> &vector) {
    for (int i = 0; i < vector.size() - 1; i++){
        for (int j = 0; j < vector.size() - i - 1; j++) {
            if (vector[j] > vector[j + 1]) {
                long tmp = vector[j];
                vector[j] = vector[j+1];
                vector[j+1] = tmp;
            }
        }
    }
}

My timings for my average case is almost double for the worst case.

"*My timings for my average case is almost double for the worst case.*" What's wrong with that? Or that is, what's the problem? Big-O notation does not guarantee real-world performance; it only describes algorithmic complexity. — Nicol Bolas, Jul 10 '19 at 16:28
@NicolBolas: It's the opposite of what "worst" means. You're right that "worst-case complexity" and "worst-case runtime" do not have to align. — Ben Voigt, Jul 10 '19 at 16:29
`__builtin_expect` might work for you: https://stackoverflow.com/questions/30130930/is-there-a-compiler-hint-for-gcc-to-force-branch-prediction-to-always-go-a-certa — NathanOliver, Jul 10 '19 at 16:30
Reverse order is not the worst case for all algorithms. For bubble sort perhaps — n. m. could be an AI, Jul 10 '19 at 16:32
The worst case for quicksort depends on your pivot selection. — molbdnilo, Jul 10 '19 at 16:34
How many elements are in the vector you are sorting? For the record, branch prediction is a very low-level CPU feature that you probably can't disable (but you or the compiler could make the code not benefit from it). — Max Langhof, Jul 10 '19 at 16:35
Anyway, does your school assignment involve *complexity* or *real world performance*? The former is not affected by branch prediction (you cannot analyse complexity by looking at execution times) and the latter is probably too complicated for a school project. — n. m. could be an AI, Jul 10 '19 at 16:37
@NathanOliver `__builtin_expect` is not useful for random order because it's unknown at compile-time whether it's more likely that the branch will be taken or not. — Hadi Brais, Jul 10 '19 at 20:29
@molbdnilo What does the traditional implementation of bubble sort look like? I think the OP's implementation is very common. But yes, it can be slightly improved as discussed in the [Wikipedia article](https://en.wikipedia.org/wiki/Bubble_sort#Optimizing_bubble_sort). Although the branch misprediction problem will remain almost the same. — Hadi Brais, Jul 10 '19 at 20:38
@NathanOliver: `__builtin_expect` has very little to do with branch *prediction* when compiling for modern x86-64. There aren't any hardware branch hints (only Pentium 4 had that). The actual effect is on branch *layout*, telling GCC which direction is common so it can make that case fast (e.g. put it on the not-taken path and group the hot code into fewer L1i cache lines.) Also, `__builtin_expect` probably makes if-conversion of branching into branchless `cmov` less likely. — Peter Cordes, Jul 11 '19 at 04:20
@molbdnilo: this looks exactly like standard Bubble Sort, except it counts upward instead of downward toward zero. Are you talking about the early-out condition? If you want to add complexity to gain performance, switch to InsertionSort instead, that's much better than early-out BubbleSort. See [Bubble Sort: An Archaeological Algorithmic Analysis](https://users.cs.duke.edu/~ola/papers/bubble.pdf) (i.e. how did Bubble Sort become popular despite sucking). The early-out on a no-swap pass was a later addition to classic Bubble Sort. The main advantage is tiny machine-code size, not perf. — Peter Cordes, Jul 11 '19 at 04:24

score 2 · Answer 1 · answered Jul 10 '19 at 16:56

Big-O notation is all about asymptotic behavior. In other words, it describes what factors become dominant as the problem size gets bigger.

CPU micro-optimizations like prefetching and branch prediction can have large relative effects at smaller sizes. But the nature of an O(n^2) procedure, relative to an O(n) procedure, is that it will become slower once the problem size is big enough.

So don't bother speculating or worrying about the effects of branch prediction. Just make your array bigger. Try sorting an array of 1 million elements. Or 1 billion. I guarantee you: if you're right about one situation being the worst case, it will get slower than the best case. (Hint: you're not right about that.)

How to disable branch prediction C++/Mac/Intel

1 Answers1