4

I was writing some templated code to benchmark a numeric algorithm using both floats and doubles, in order to compare against a GPU implementation.

I discovered that my floating point code was slower and after investigating using Vtune Amplifier from Intel I discovered that g++ was generating extra x86 instructions (cvtps2pd/cvtpd2ps and unpcklps/unpcklpd) to convert some intermediate results from float to double and then back again. The performance degradation is almost 10% for this application.

After compiling with the flag -Wdouble-promotion (which BTW is not included with -Wall or -Wextra), sure enough g++ warned me that the results were being promoted.

I reduced this to a simple test case shown below. Note that the ordering of the c++ code affects the generated code. The compound statement (T d1 = log(r)/r;) produces a warning, whilst the separated version does not (T d = log(r); d/=r;).

The following was compiled with both g++-4.6.3-1ubuntu5 and g++-4.7.3-2ubuntu1~12.04 with the same results.

Compile flags are:

g++-4.7 -O2 -Wdouble-promotion -Wextra -Wall -pedantic -Werror -std=c++0x test.cpp -o test

#include <cstdlib>
#include <iostream>
#include <cmath>

template <typename T>
T f()
{
        T r = static_cast<T>(0.001);

        // Gives no double promotion warning
        T d = log(r);
        d/=r;
        // Promotes to double
        T d1 = log(r)/r;

        return d+d1;
}

int main()
{
        float f1 = f<float>();
        std::cout << f1 << std::endl;
}

I realise that the c++11 standard allows the compiler discretion here. But why does the order matter?

Can I explicitly instruct g++ to use floats only for this calculation?

EDIT: SOLVED by Mike Seymour. Needed to use std::log to ensure picking up the overloaded version of log instead of calling the C double log(double). The warning was not generated for the separated statement because this is a conversion and not a promotion.

amckinley
  • 629
  • 1
  • 7
  • 15

1 Answers1

5

The problem is

log(r)

In this implementation, it seems that the only log in the global namespace is the C library function, double log(double). Remember that it's not specified whether or not the C-library headers in the C++ library dump their definitions into the global namespace as well as namespace std.

You want

std::log(r)

to ensure that the extra overloads defined by the C++ library are available.

Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
  • It looks like you are correct. I thought that log would have been calling the overloaded version float log(float x). So in that case, why does the separated version not produce a promotion? – amckinley Sep 10 '13 at 14:22
  • As for why `T d = log(r);` doesn't warn, it may be that gcc optimizes `(float)log((double)x)` to `logf(x)` when x is a float. – Marc Glisse Sep 10 '13 at 14:24
  • 1
    @amckinley: That produces a conversion, not a promotion. You'd need `-Wconversion` to catch that. – Mike Seymour Sep 10 '13 at 14:28
  • Ok I discovered that the lack of warning is actually the problem. After examining the generated code using objdump, both statements produce almost exactly the same code with the extra cvtps2pd/cvtpd2ps and unpcklps/unpcklpd instructions. So the problem was that `double log(double)` was being called and also that g++ was not warning about the double promotion in the second case. EDIT: And apparently it's not supposed to. I needed -Wconversion to catch this. – amckinley Sep 10 '13 at 14:30
  • @MarcGlisse: That's unlikely, unless the compiler knows that both operations give exactly the same results for all inputs (which they probably don't). And it's not happening here (at least with my version of GCC), as compiling with `-Wconversion` would demonstrate. – Mike Seymour Sep 10 '13 at 14:31