As CUDA 2.0x doesn't have atomicAdd()
function for double, then I define the 'atomicAdd()' function as atomicAddd()
according to this question,
Why has atomicAdd not been implemented for doubles?
Here is the code for the device function:
__device__ double atomicAddd(double* address, double val)
{
unsigned long long int* address_as_ull =
(unsigned long long int*)address;
unsigned long long int old = *address_as_ull, assumed;
do {
assumed = old;
old = atomicCAS(address_as_ull, assumed,
__double_as_longlong(val +
__longlong_as_double(assumed)));
} while (assumed != old);
return __longlong_as_double(old);
}
The code is the same except the function name.
Here is part of my kernel:
__global__ void test(double *dev_like, double *dev_sum){
__shared__ double lik;
// some code to compute lik;
// copy lik back to global dev_lik;
dev_like[blockIdx.x] = lik;
// add lik to dev_sum
if(threadIdx.x == 0){
atomicAddd(dev_sum, loglik);
}
}
After I copy the dev_lik
back to host and add them to sum
, and I also copy the dev_sum
back to host sum1
. My understanding is that the sum
should be the same of sum1
, here is my host code to print them.
for (int m = 0; m < 100; ++m){
if(sum[m] == sum1[m]){
std::cout << "True" << std::endl;
}
else{
std::cout << "False" << "\t" << std::setprecision(20) << sum[m] << "\t" << std::setprecision(20) << sum1[m] << std::endl;
}
}
and I get the result as following:
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
False -1564.0205173292260952 -1564.0205173292256404
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
False -1563.4011523293495429 -1563.4011523293493156
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
Some results show False
but the difference between sum
and sum1
is very small, have no idea what is the problem.