Update:
I've run this example with other systems. On an Intel i7-3630QM, Intel HD4000 and Radeon HD 7630M, all results are the same. With an i7-4700MQ / 4800MQ the results of the CPU are different when OpenCL or a 64 bit gcc is used from an 32 bit gcc. This is a result of the 64 bit gcc and OpenCl using SSE by default and the 32 bit gcc using 387 math, at least the 64 bit gcc produces the same results when -mfpmath=387 is set. So I have to read a lot more and experiment with x86 floating point. Thank you all for your answers.
I've run the Lorenz system example from "Programming CUDA and OpenCL: A case study using modern C++ libraries" for ten systems each on different OpenCL devices and am getting different results:
Quadro K1100M (NVIDIA CUDA)
R => x y z
0.100000 => -0.000000 -0.000000 0.000000
5.644444 => -3.519254 -3.519250 4.644452
11.188890 => 5.212534 5.212530 10.188904
16.733334 => 6.477303 6.477297 15.73333322.277779 => 3.178553 2.579687 17.946903
27.822224 => 5.008720 7.753564 16.377680
33.366669 => -13.381100 -15.252210 36.107887
38.911114 => 4.256534 6.813675 23.838787
44.455555 => -11.083726 0.691549 53.632290
50.000000 => -8.624105 -15.728293 32.516193Intel(R) HD Graphics 4600 (Intel(R) OpenCL)
R => x y z
0.100000 => -0.000000 -0.000000 0.000000
5.644444 => -3.519253 -3.519250 4.644451
11.188890 => 5.212531 5.212538 10.188890
16.733334 => 6.477320 6.477326 15.73333922.277779 => 7.246771 7.398651 20.735369
27.822224 => -6.295782 -10.615027 14.646572
33.366669 => -4.132523 -7.773201 14.292910
38.911114 => 14.183139 19.582197 37.943520
44.455555 => -3.129006 7.564254 45.736408
50.000000 => -9.146419 -17.006729 32.976696Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz (Intel(R) OpenCL)
R => x y z
0.100000 => -0.000000 -0.000000 0.000000
5.644444 => -3.519254 -3.519251 4.644453
11.188890 => 5.212513 5.212507 10.188900
16.733334 => 6.477303 6.477296 15.73333222.277779 => -8.295195 -8.198518 22.271002
27.822224 => -4.329878 -4.022876 22.573458
33.366669 => 9.702943 3.997370 38.659538
38.911114 => 16.105495 14.401397 48.537579
44.455555 => -12.551083 -9.239071 49.378693
50.000000 => 7.377638 3.447747 47.542763
As you can see, the three devices agree on the values up to R=16.733334 and then start to diverge.
I have run the same region with odeint without VexCL and get results close to the outcome of the OpenCL on CPU run:
Vanilla odeint:
R => x y z
16.733334 => 6.47731 6.47731 15.7333
22.277779 => -8.55303 -6.72512 24.7049
27.822224 => 3.88874 3.72254 21.8227
The example code can be found here: https://github.com/ddemidov/gpgpu_with_modern_cpp/blob/master/src/lorenz_ensemble/vexcl_lorenz_ensemble.cpp
I'm not sure what I am seeing here? Since the CPU results are so close to each other, it looks like an issue with the GPUs, but since I am an OpenCL newbie I need some pointers how to find the underlying cause of this.