0

Here's my demo program:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int cmp(const void *d1, const void *d2)
{
    int a, b;

    a = *(int const *) d1;
    b = *(int const *) d2;

    if (a > b)
        return 1;
    else if (a == b)
        return 0;

    return -1;
}

int main()
{
    int seed = time(NULL);
    srandom(seed);

    int i, n, max = 32768, a[max];

    for (n=0; n < max; n++) {

        int r = random() % 256;
        a[n] = r;

    }

    qsort(a, max, sizeof(int), cmp);

    clock_t beg = clock();

    long long int sum = 0;

    for (i=0; i < 20000; i++) 
    {
        for (n=0; n < max; n++) {
            if (a[n] >= 128)
                sum += a[n];
        }
    }

    clock_t end = clock();

    double sec = (end - beg) / CLOCKS_PER_SEC;

    printf("sec: %f\n", sec);
    printf("sum: %lld\n", sum);

    return 0;
}



unsorted
sec: 5.000000
sum: 63043880000

sorted
sec: 1.000000
sum: 62925420000

Here's an assembly diff of two versions of the program, one with qsort and one without:

--- unsorted.s  
+++ sorted.s    
@@ -58,7 +58,7 @@
    shrl    $4, %eax
    sall    $4, %eax
    subl    %eax, %esp
-   leal    4(%esp), %eax
+   leal    16(%esp), %eax
    addl    $15, %eax
    shrl    $4, %eax
    sall    $4, %eax
@@ -83,6 +83,13 @@
    movl    -16(%ebp), %eax
    cmpl    -24(%ebp), %eax
    jl  .L7
+   movl    -24(%ebp), %eax
+   movl    $cmp, 12(%esp)
+   movl    $4, 8(%esp)
+   movl    %eax, 4(%esp)
+   movl    -32(%ebp), %eax
+   movl    %eax, (%esp)
+   call    qsort
    movl    $0, -48(%ebp)
    movl    $0, -44(%ebp)
    movl    $0, -12(%ebp)

As far as I understand the assembly output, the sorted version just has more code due to passing values to qsort, but I don't see any branching optimization/prediction/whatever thing. Maybe I'm looking in the wrong direction?

houbysoft
  • 32,532
  • 24
  • 103
  • 156
  • 2
    Branch-prediction is done by the CPU, not by the compiler. – Oliver Charlesworth Jul 01 '12 at 14:52
  • Isn't this a `C` duplicate of [Why is processing a sorted array faster than an unsorted array](http://stackoverflow.com/questions/11227809)? – Blastfurnace Jul 01 '12 at 14:57
  • @Blastfurnace: no. That question asks why processing a sorted array is faster than an unsorted array, and the answer is branch prediction. This question asks how come that none of the branch prediction can be seen in assembly. – houbysoft Jul 01 '12 at 15:01
  • Note: `double sec = (end - beg) / CLOCKS_PER_SEC;` is computed in integer values only; you'll have to find a way to cast it to float. – wildplasser Jul 01 '12 at 15:02
  • @wildplasser, yeah missed that... –  Jul 01 '12 at 17:45
  • you are changing the seed each run, you cannot compare the time for one run to the next until you fix the seed and make the test repeatable. Verify with a fixed seed you get the same dataset every run, add a loop to take the checksum. Your printout already shows that you are comparing apples to oranges, the sum has to be the same to compare one run to the next. – old_timer Jul 02 '12 at 05:37

2 Answers2

5

Branch prediction is not something you will see at the assembly code level; it is done by the CPU itself.

houbysoft
  • 32,532
  • 24
  • 103
  • 156
  • The compiler may emit a static branch prediction hint (`0x3e`) if the code is profiled optimized for the Prescott. But it didn't have any measurable performance benefits back then, that's why all newer processors ignore that branch hint silently. – Gunther Piez Jul 01 '12 at 18:36
0

Built-in Function: long __builtin_expect (long exp, long c)

You may use __builtin_expect to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.

The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:

if (__builtin_expect (x, 0))
  foo ();

indicates that we do not expect to call foo, since we expect x to be zero. Since you are limited to integral expressions for exp, you should use constructions such as

if (__builtin_expect (ptr != NULL, 1))
  foo (*ptr);

when testing pointer or floating-point values.

Otherwise the branch prediction is determined by the processor...

Branch prediction predicts the branch target and enables the processor to begin executing instructions long before the branch true execution path is known. All branches utilize the branch prediction unit (BPU) for prediction. This unit predicts the target address not only based on the EIP of the branch but also based on the execution path through which execution reached this EIP. The BPU can efficiently predict the following branch types:

• Conditional branches.

• Direct calls and jumps.

• Indirect calls and jumps.

• Returns.


The microarchitecture tries to overcome this problem by feeding the most probable branch into the pipeline and execut[ing] it speculatively.

...Using various methods of branch prediction.

veganaiZe
  • 539
  • 5
  • 13