float vs double performance testing on beagle bone black

Question

I am doing some image processing on my Beaglebone Black and am interested in the performance gain of using floats vs doubles in my algorithm.

I've tried to devise a simple test for this:

main.c

#define MAX_TEST 10
#define MAX_ITER 1E7
#define DELTA 1E-8 

void float_test()
{
    float n = 0.0;
    for (int i=0; i<MAX_ITER; i++)
    {
        n += DELTA;
        n /= 3.0;
    }
}


void double_test()
{
    double n = 0.0;
    for (int i=0; i<MAX_ITER; i++)
    {
        n += DELTA;
        n /= 3.0;
    }
}


int main()
{
    for (int i=0; i<MAX_TEST; i++)
    {
        double_test();
        float_test();
    }

    return 0;
}

ran as:

gcc -Wall -pg main.c  -std=c99
./a.out
gprof a.out gmon.out -q > profile.txt

profile.txt:

granularity: each sample hit covers 4 byte(s) for 0.03% of 35.31 seconds

index % time    self  children    called     name
                                                 <spontaneous>
[1]    100.0    0.00   35.31                 main [1]
               18.74    0.00      10/10          float_test [2]
               16.57    0.00      10/10          double_test [3]
-----------------------------------------------
               18.74    0.00      10/10          main [1]
[2]     53.1   18.74    0.00      10         float_test [2]
-----------------------------------------------
               16.57    0.00      10/10          main [1]
[3]     46.9   16.57    0.00      10         double_test [3]
-----------------------------------------------

I am not sure if the compiler is optimizing away some of my code or if I am doing enough arithmetic for it to matter. I find it a bit odd that the double_test() is actually taking less time than the float_test().

I've tried switching the order in which the functions are called and they results are still the same. Could somebody explain this to me?

If you are not sure whether the compiler is optimizing away stuff, why don't you make your program receive crucial inputs (or at the very least read them from volatile locations) and print the computed output? PS: yes, a half-decent optimizing compiler should remove all computations PS2: why are you even measuring the time taken by a program that isn't compiled with optimization? There doesn't need to be any sense to the relative times taken by `double` and `float` then. — Pascal Cuoq, Aug 07 '14 at 15:43
C allows FP computation to occur at higher precision than needed. See `FLT_EVAL_METHOD` setting. — chux - Reinstate Monica, Aug 07 '14 at 16:36

score 0 · Answer 1 · answered Aug 07 '14 at 23:29

On my machine (x86_64), looking at the code generated, side by side:

double_test:                                  .. float_test:
  xorpd  %xmm0,%xmm0  // double n             --   xorps  %xmm0,%xmm0      // float n
  xor    %eax,%eax    // int i                ==   xor    %eax,%eax
loop:                                         .. loop:
                                              ++   unpcklps %xmm0,%xmm0    // Extend float n to...
                                              ++   cvtps2pd %xmm0,%xmm0    // ...double n
  add    $0x1,%eax     // ++i                 ==   add    $0x1,%eax
  addsd  %xmm2,%xmm0   // double n += DELTA   ==   addsd  %xmm2,%xmm0
  cvtsi2sd %eax,%xmm3  // (double)i           ==   cvtsi2sd %eax,%xmm3
                                              ++   unpcklpd %xmm0,%xmm0    // Reduce double n to...
                                              ++   cvtpd2ps %xmm0,%xmm0    // ...float n
  divsd  %xmm5,%xmm0   // double n /= 3.0     --   divss  %xmm4,%xmm0      // float n / 3.0
  ucomisd %xmm3,%xmm1  // (double)i cmp 1E7   ==   ucomisd %xmm3,%xmm1
  ja      ...loop...   // if (double)i < 1E7  ==   ja      ...loop...

showing four extra instructions to change up to double and back down to float in order to add DELTA.

The DELTA is 1E-8 which is implictly double. So, adding that is done double. Of course, 3.0 is also implictly double, but I guess the compiler spots that there is no effective difference between double and single in this case.

Defining DELTAF as 1E-8f gets rid of the change up to and down from double for the add.

float vs double performance testing on beagle bone black

main.c

ran as:

profile.txt:

1 Answers1

Linked