Define atomicAdd function doesn't work in CUDA

Question

As CUDA 2.0x doesn't have atomicAdd() function for double, then I define the 'atomicAdd()' function as atomicAddd() according to this question,

Why has atomicAdd not been implemented for doubles?

Here is the code for the device function:

__device__ double atomicAddd(double* address, double val)
{
    unsigned long long int* address_as_ull =
                             (unsigned long long int*)address;
    unsigned long long int old = *address_as_ull, assumed;
    do {
        assumed = old;
old = atomicCAS(address_as_ull, assumed,
                        __double_as_longlong(val +
                               __longlong_as_double(assumed)));
    } while (assumed != old);
    return __longlong_as_double(old);
}

The code is the same except the function name.

Here is part of my kernel:

__global__ void test(double *dev_like, double *dev_sum){
    __shared__ double lik;
    // some code to compute lik;
    // copy lik back to global dev_lik;
    dev_like[blockIdx.x] = lik;

    // add lik to dev_sum
    if(threadIdx.x == 0){
        atomicAddd(dev_sum, loglik);
    }

}

After I copy the dev_lik back to host and add them to sum, and I also copy the dev_sum back to host sum1. My understanding is that the sum should be the same of sum1, here is my host code to print them.

for (int m = 0; m < 100; ++m){
        if(sum[m] == sum1[m]){
            std::cout << "True" << std::endl;
        }
        else{
            std::cout << "False" << "\t" << std::setprecision(20) << sum[m] << "\t" << std::setprecision(20) << sum1[m] << std::endl;
        }
    }

and I get the result as following:

True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
False   -1564.0205173292260952  -1564.0205173292256404
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
False   -1563.4011523293495429  -1563.4011523293493156
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True

Some results show False but the difference between sum and sum1 is very small, have no idea what is the problem.

tera · Accepted Answer · 2016-11-25T10:36:35.690

5

Unlike mathematical addition, floating point addition is not associative because of the rounding step involved. In situations where atomic operations are necessary, the order of operations is not deterministic. So nondeterministic rounding errors are inevitable.

edited Nov 25 '16 at 10:36

answered Nov 24 '16 at 11:56

tera

7,080
1
21
32

Understand it now. Thanks. – Fly_back Nov 25 '16 at 02:04

Define atomicAdd function doesn't work in CUDA

1 Answers1

Linked

Related