Double precision floating point in CUDA

Question

Does CUDA support double precision floating point numbers?

Also, what are the reasons for the same?

Paul R · Answer 1 · 2017-11-18T06:44:28.097

16

If your GPU has compute capability 1.3 then you can do double precision. You should be aware though that 1.3 hardware has only one double precision FP unit per MP, which has to be shared by all the threads on that MP, whereas there are 8 single precision FPUs, so each active thread has its own single precision FPU. In other words you may well see 8x worse performance with double precision than with single precision.

edited Nov 18 '17 at 06:44

answered May 12 '10 at 08:24

Paul R

208,748
37
389
560

1

Thanks for the tip Paul. I wanted to switch to `double` precision mostly for accuracy. I'm consulting on a side-project where I'm converting Python code to C++ / CUDA and the Python code uses `double` precision everywhere. I noticed that when I switched to using `float` I had an maximum absolute difference of `1e-06` for the results. I wasn't too satisfied with that, but I'd rather take the bullet with the accuracy than the performance. Thanks! +1. – rayryeng Apr 28 '17 at 07:41
Ha - commenting on 7 year old answers now Ray ? ;-) Seriously though this may be a bit out of date now - I haven't played with CUDA for a few years and the latest nVidia hardware may well have better double precision support by now, for all I know. – Paul R Apr 28 '17 at 08:54
2

Hehe I didn't notice the year. I looked up the capability before I commented :). The card I'm working on for my client only has compute capability 3.0 and it's still only with half of that of single precision. It has only been in full support since 6.0... Pity. Thanks nonetheless, even if this was 7 years old! – rayryeng Apr 28 '17 at 08:57
One other thing to consider is that if the GPU is old, but the CPU is reasonably new (and particularly if it has a good number of cores), then you may get better results with a good FFT library (e.g. FFTW) on the CPU, which is a lot easier to implement and manage. Anyway, good luck with whichever route you go down! – Paul R Apr 28 '17 at 09:18
@rayryeng can you please share how did you went on to solve your problem? – รยקคгรђשค Aug 30 '17 at 16:07
@Suparshva What problem would that be? I ended up not using double-precision floating point primarily due to performance. I simply wrote all of my kernels to use single precision instead. – rayryeng Aug 30 '17 at 16:08
@rayryeng ohh ok :D ... I misunderstood your first comment and thought if you went on to improve accuracy with maintaining considerable performance as Paul recommended ... thanks anyways – รยקคгรђשค Aug 30 '17 at 16:19
2

@Suparshva Ah I see. No, my first comment at the end says "... but I'd rather taken the bullet with accuracy than performance"... meaning that I ended up using single-precision instead. I also didn't go with using any FFT based solutions because it wasn't required for my specific use case (even though I did implement a convolution in 2D). – rayryeng Aug 30 '17 at 16:21
What is "compute capability 1.3"? Can you provide some context (by editing your answer)? – Peter Mortensen Nov 17 '17 at 23:31
@PeterMortensen: yes, “compute capability” is just an nVidia term for a number, somewhat like a version number, that describes which features are available on a given GPU. See the [CUDA Wikipedia page - section “Version features and feature specifications”](https://en.m.wikipedia.org/wiki/CUDA#Version_features_and_specifications) for further details (I’ve also added this link to the answer). Note that 1.3 is quite old now - current high end GPUs have compute capability up to 7.1. – Paul R Nov 18 '17 at 06:49

score 11 · Answer 2 · edited Dec 01 '16 at 03:22

11

As a tip:

If you want to use double precision you have to set the GPU architecture to sm_13 (if your GPU supports it).

Otherwise it will still convert all doubles to floats and gives only a warning (as seen in faya's post). (Very annoying if you get a error because of this :-) )

The flag is: -arch=sm_13

edited Dec 01 '16 at 03:22

phuclv

37,963
15
156
475

answered Aug 30 '10 at 23:27

score 9 · Answer 3 · answered May 12 '10 at 09:41

9

Following on from Paul R's comments, Compute Capability 2.0 devices (aka Fermi) have much improved double-precision support, with performance only half that of single-precision.

This Fermi whitepaper has more details about the double performance of the new devices.

answered May 12 '10 at 09:41

Edric

23,676
2
38
40

+1: thanks for that additional info - I haven't worked with CUDA for about a year now and wasn't aware of Compute Capability 2.0 - nothing in tech stays still for very long ! – Paul R May 12 '10 at 09:58
1

Be aware though that Fermi's double precision performance is (artificially) lower for GeForce cards than for Teslas. Quadro cards should have the same performance level as Tesla cards. – Eric Jul 07 '10 at 12:22
Unfortunately, Quadro cards appear to be priced at around 10 times the price of GeForce cards with corresponding GPUs (though Quadro cards come with more memory). – Roger Dahl Jan 28 '11 at 21:15

phuclv · Answer 4 · 2019-06-03T13:44:05.337

As mentioned by others, older CUDA cards don't support the double type. But if you want more precision than the one your old GPU provides you can use the float-float solution which is similar to the double-double technique. For more information about that technique read

Of course on modern GPUs you can also use double-double to achieve an accuracy larger than double. double-double is also used for long double on PowerPC

Double precision floating point in CUDA

4 Answers4

Linked