36

Why hasnt atomicAdd() for doubles been implemented explicitly as a part of CUDA 4.0 or higher?

From the appendix F Page 97 of the CUDA programming guide 4.1 the following versions of atomicAdd have been implemented.

int atomicAdd(int* address, int val);
unsigned int atomicAdd(unsigned int* address,
                       unsigned int val);
unsigned long long int atomicAdd(unsigned long long int* address,
                                 unsigned long long int val);
float atomicAdd(float* address, float val)

The same page goes on to give a small implementation of atomicAdd for doubles as follows which I have just started using in my project.

__device__ double atomicAdd(double* address, double val)
{
    unsigned long long int* address_as_ull =
                             (unsigned long long int*)address;
    unsigned long long int old = *address_as_ull, assumed;
    do {
        assumed = old;
old = atomicCAS(address_as_ull, assumed,
                        __double_as_longlong(val +
                               __longlong_as_double(assumed)));
    } while (assumed != old);
    return __longlong_as_double(old);
}

Why not define the above code as a part of CUDA ?

harrism
  • 26,505
  • 2
  • 57
  • 88
smilingbuddha
  • 14,334
  • 33
  • 112
  • 189
  • 2
    Probably so that every user of it is aware of it's implementation, as it is not a built-in instruction and the retry logic can be subject to livelocks (as there is no guarantee of fairness, a thread can get stalled for as long as there are other threads updating the same variable). – tera Sep 27 '12 at 16:56

1 Answers1

42

Edit: As of CUDA 8, double-precision atomicAdd() is implemented in CUDA with hardware support in SM_6X (Pascal) GPUs.

Currently, no CUDA devices support atomicAdd for double in hardware. As you noted, it can be implemented in terms of atomicCAS on 64-bit integers, but there is a non-trivial performance cost for that.

Therefore, the CUDA software team chose to document a correct implementation as an option for developers, rather than make it part of the CUDA standard library. This way developers are not unknowingly opting in to a performance cost they don't understand.

Aside: I don't think this question should be closed as "not constructive". I think it's a perfectly valid question, +1.

harrism
  • 26,505
  • 2
  • 57
  • 88
  • 1
    Yes, but technically you are one of very few people how may answer the question. While I have said why I think it makes a lot of sense this way, only you can say whether this is why the CUDA team chose it that way. ;-) Anyway I wasn't the one to downvote the question. – tera Sep 28 '12 at 00:22
  • There are multiple NVIDIA folks who read and answer CUDA questions on SO (especially while our developer forums are down), and that fact makes questions like this valid. And you could have posted your comment as an answer, and it would have been correct, and I would have upvoted it. :) BTW, I didn't assume you downvoted; I was referring to the one vote to close the question. – harrism Sep 28 '12 at 00:35
  • 1
    I agree, this is a perfectly valid question, the CUDA headers could have implemented double atomics in software. Although the way it was formulated triggered the red light for some people, I think the decision should be reverted! – pszilard Feb 26 '13 at 23:03