I'm new at CUDA and OpenCL.
I have translated the kernels of a program from CUDA kernels to OpenCL kernels. I'm using the same seeds for the random number generation in both versions.
While the OpenCL version gets the exact same results every run, the CUDA version gives a slight different results every run.
I'm compiling the CUDA version without -use_fast_math
.
My device is 1.1 capability.
Any idea about what could be the reason?
Thanks in advance