3

I'm looking into CUDA header file cuda/6.5.14/RHEL6.x/include/math_functions_dbl_ptx1.h and see that every arithmetic function that takes a double argument casts it into float:

static __forceinline__ double fabs(double a)
{
  return (double)fabsf((float)a);
}

...

static __forceinline__ double floor(double a)
{
  return (double)floorf((float)a);
}

Since I rely in essential way on double precision floating point (there are quite a few potentially catastrophic cancellations in the code) I have some trouble believing my own eyes.

Could you explain what's going on here?

Michael
  • 5,775
  • 2
  • 34
  • 53

1 Answers1

7

What you're looking at is a file that is used when compiling for a cc1.1 or cc1.2 device (on CUDA 6.5) that did not have native support for double arithmetic, and yes CUDA would "quietly" "demote" double to float. (The compiler would emit a warning when this was occurring.)

This behavior did not manifest itself on devices of compute capability 1.3 and higher, all of which have native support for double arithmetic.

CUDA 7 and 7.5 no longer support devices that have a compute capability less than 2.0, so this particular behavior could no longer manifest itself, and it becomes only of historical interest on newer CUDA toolkits. (And the file in question has been removed from these newer CUDA toolkits.)

For reference, when this "demotion" was occurring, the compiler would emit a warning of the following form:

ptxas /tmp/tmpxft_00000949_00000000-2_samplefilename.ptx, line 65; warning : Double is not supported. Demoting to float

If you don't see that warning in your compile output, the demotion is not occurring.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • When you create a .cu file in Visual Studio, I think it sets compute capabilty 1.0 by default. I see many other people having this issue in the future. – The Vivandiere Oct 06 '15 at 23:49
  • 2
    I see almost nobody having this issue in the future, whatever "this issue" is. This demotion capability has been removed from the CUDA toolchain for CUDA 7 and all future CUDA versions. The default compute capability [varies by CUDA toolkit version](http://stackoverflow.com/questions/28932864/cuda-compute-capability-requirements) and CUDA 7 and beyond do not set a default of 1.0. – Robert Crovella Oct 06 '15 at 23:53
  • 2
    Even in CUDA 6.5, the *default* target architecture for `nvcc` is sm_20. If I recall correctly the last version of CUDA that used sm_1x as the default target architecture was CUDA 6.0, which shipped two years ago. So I agree with Robert Crovella that the double-to-float demotion should be a non-issue going forward, as the functionality and file in question were completely removed in CUDA 7.0 and most sm_1x GPUs have been retired/scrapped by now. – njuffa Oct 07 '15 at 02:30