1

I'm new at CUDA and OpenCL.

I have translated the kernels of a program from CUDA kernels to OpenCL kernels. I'm using the same seeds for the random number generation in both versions. While the OpenCL version gets the exact same results every run, the CUDA version gives a slight different results every run. I'm compiling the CUDA version without -use_fast_math. My device is 1.1 capability. Any idea about what could be the reason?

Thanks in advance

Sullivan Risk
  • 319
  • 1
  • 4
  • 21
James Stewart
  • 51
  • 1
  • 5
  • 1
    Doesn't that mean that your OpenCL version is incorrect? – talonmies Jul 10 '13 at 14:52
  • 2
    There's not enough information in your question to make any informed statements, in my opinion. Can you provide a short, complete compilable code that demonstrates the problem? (A complete code that I can copy, paste, and compile without having to add or edit anything.) Are you doing [cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) on all cuda API calls and kernel calls? Have you run your code with `cuda-memcheck` to look for out-of-bounds accesses in your kernel or other problems? – Robert Crovella Jul 10 '13 at 14:54
  • Actually, I'm implementing Artificial Neural Network. It is hard to provide all the code. – James Stewart Jul 10 '13 at 15:02
  • OpenCL version seems working correctly, and the final weights are always the same as I start with a fixed seed. the CUDA version gives a slight different results. My question actually is is there any CUDA compiling options that may affect the accuracy of the CUDA results. – James Stewart Jul 10 '13 at 15:05
  • And thanks for your effort and your answers – James Stewart Jul 10 '13 at 15:06
  • I have voted to close this. I fail to see how this question could be answered in its current form.... – talonmies Jul 10 '13 at 15:09
  • 2
    Cut down the code to something short that reproduces the problem. Frequently, when people create a [reproducer](http://sscce.org/), they discover the problem themselves. You also haven't answered my questions about cuda error checking or cuda-memcheck. – Robert Crovella Jul 10 '13 at 15:09
  • Yes, I did cuda error checking. But i did not run my code with cuda-memcheck yet. I will do that and tell you.Thanks – James Stewart Jul 10 '13 at 15:15
  • I used cuda-memcheck and no reported errors. – James Stewart Jul 10 '13 at 18:11

2 Answers2

1

Devices of compute capability 1.1 do not support double operations. So if you are using double they are getting demoted to float. That could possibly affect your results, although a compute capability 1.1 device cannot support double in OpenCL either, AFAIK.

My question actually is is there any CUDA compiling options that may affect the accuracy of the CUDA results.

Yes, there are a variety of options that affect CUDA's usage of floating point math

I don't know why any of this would lead to variation from one run to the next, however. It's likely that you have a bug in the code.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Actually, I'm using float, not double operations. – James Stewart Jul 10 '13 at 15:22
  • My opencl version is exact translation of cuda kernels. CUDA version is the original one. How could the translation (opencl version) works correctly while the original code (cuda) doesn't. My guess is that is because ofmy cuda compilation. – James Stewart Jul 10 '13 at 15:24
  • 2
    @JamesStewart: Are you actually asking why code you didn't write and haven't shown doesn't (in your opinion) work correctly when your translation of that code to OpenCL (again in your opinion) does work corectly? – talonmies Jul 10 '13 at 16:08
  • 1
    I'm asking about the effect of compilation options on the accuracy of the results. – James Stewart Jul 10 '13 at 16:42
  • Compiler options should not cause the result to vary each time you run your CUDA code. They could cause differences between your CUDA and OpenCL results, but the difference should always be the same. My first suspect is an out-of-bounds memory access. Old CUDA devices did not catch these for global memory so wrong results could arise. Second, but less likely is a hardware fault. Try cuda-memcheck and maybe also gpu-burn - [link](http://wili.cc/blog/gpu-burn.html) - if you suspect hardware problems. – chippies Jul 10 '13 at 16:58
  • Without actually seeing a reproducer, I'm out of suggestions. – Robert Crovella Jul 10 '13 at 18:14
  • I found the problem. In the original code, some values were updated asynchronously and was not completely updated yet. Thanks everybody for help. – James Stewart Jul 11 '13 at 15:15
1

I found the problem. In the original code, some values were updated asynchronously and was not completely updated yet. Thanks everybody for help. And sorry for the troubles.

James Stewart
  • 51
  • 1
  • 5