Can anybody help me with atomicmin function syntax for cuda?

Question

I am unable to get a proper syntax for using atomicMin. I would like to use this function to operate on double rather than integer.

__global__ void npd(int *a, int *g)         
    {   
        int index = threadIdx.x;

        __shared__ int d[N];

        d[threadIdx.x]=a[index];        

        __syncthreads();        

        int dd;
        int inn;
        int u;

        if( 0==threadIdx.x )
        { 
            for( int u = 0; u<16; u++ )
            {
                atomicMin( g, d ) ;     
            }
        }
    }

There is none inbuilt, but there are some nifty tricks you can use to create your own: http://stereopsis.com/radix.html — Ander Biguri, Mar 13 '19 at 11:59
This Devtalk forum post seems to give a solution, I haven't tested it, give it a go : https://devtalk.nvidia.com/default/topic/492068/atomicmin-with-float/ — Ander Biguri, Mar 13 '19 at 12:04
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicmin -- the documentation clearly says there is no atomicMin for any floating point types, single or double — talonmies, Mar 13 '19 at 12:15
@talonmies I assume the question is less literal than what you answer. OP may know it does not exists, and is asking for help on how to implement it. — Ander Biguri, Mar 13 '19 at 13:32

Robert Crovella · Answer 1 · 2019-03-13T15:53:52.890

The atomicMin function defined by CUDA doesn't support use with floating-point quantities. Referring to the documentation, we see that the only available prototypes are for int, unsigned int, and unsigned long long int (the last requiring compiling for, and running on, a GPU of compute capability 3.5 or higher).

There are at least 2 options.

You could refactor your code to replace atomics with a classical parallel reduction.
As indicated in the programming guide, "arbitrary" atomics can be created using atomicCAS (compare-and-swap) plus a loop of some sort.

Here's one possible realization, for double:

__device__ double atomicMin_double(double* address, double val)
{
    unsigned long long int* address_as_ull = (unsigned long long int*) address;
    unsigned long long int old = *address_as_ull, assumed;
    do {
        assumed = old;
        old = atomicCAS(address_as_ull, assumed,
            __double_as_longlong(fmin(val, __longlong_as_double(assumed))));
    } while (assumed != old);
    return __longlong_as_double(old);
}

This related question and answers may also be of interest, although it primarily has float in view.

A few other comments:

by switching to float instead of double I believe it is possible to simplify the atomicMin (or atomicMax) operation as indicated in the answer I linked to above, probably with a few caveats (e.g. no NaN, INF data, for example). I believe that iee754 float follows an ordering rule for two quantities A and B such that if A > B, then *reinterpret_cast<int*>(&A) > *reinterpret_cast<int*>(&B). I'm not sure double follows a similar rule with long long (probably someone else will be able to say).
in your code, this loop could operate on a local quantity first, then do a single atomic operation at the end, like so:
```
    double v = *g;
    for( int u = 0; u<16; u++ )
    {
        v = min(v,d);     
    }
    atomicMin_double(g, v);
```
which I think should be faster

Can anybody help me with atomicmin function syntax for cuda?

1 Answers1