From this question and this question I managed to compile a minimal example of summing a vector into a single double inside OpenCL 1.2.
/* https://suhorukov.blogspot.com/2011/12/opencl-11-atomic-operations-on-floating.html */
inline void AtomicAdd(volatile __global double *source, const double operand) {
union { unsigned int intVal; double floatVal; } prevVal, newVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = prevVal.floatVal + operand;
} while( atomic_cmpxchg((volatile __global unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal );
}
void kernel cost_function(__constant double* inputs, __global double* outputs){
int index = get_global_id(0);
if(0 == error_index){ outputs[0] = 0.0; }
barrier(CLK_GLOBAL_MEM_FENCE);
AtomicAdd(&outputs[0], inputs[index]); /* (1) */
//AtomicAdd(&outputs[0], 5.0); /* (2) */
}
As in fact this solution is incorrect because the result is always 0 when the buffer is accessed. What might the problem with this?
the code at /* (1) */
doesn't work, and neither does the code at /* (2) */
, which is only there to test the logic independent of any inputs.
Is barrier(CLK_GLOBAL_MEM_FENCE);
used correctly here to reset the output before any calculations are done to it?
According to the specs in OpenCL 1.2 single precision floating point numbers are supported by atomic operations, is this(AtomicAdd
) a feasible method of extending the support to double precision numbers or am I missing something?
Of course the device I am testing with supports cl_khr_fp64
˙of course.