All, I have the following lines of code for setting up a 3D image in OpenCL:
const size_t NPOLYORDERS = 16;
const size_t NPOLYBINS = 1024;
cl::Image3D my3DImage;
cl::ImageFormat imFormat(CL_R, CL_FLOAT);
my3Dimage = cl::Image3D(clContext, CL_MEM_READ_ONLY, imFormat, NPOLYORDERS, NPOLYORDERS, NPOLYBINS);
The code runs fine when I use the Intel OpenCL CPU driver (by creating a context with CL_DEVICE_TYPE_CPU), but fails with a segfault when I use the nVidia driver with a TITAN black (by creating a context with CL_DEVICE_TYPE_GPU).
All of this is on RHEL6.4 with a 2.6.32-358 kernel using the latest nVidia driver available, using the Intel OpenCL runtime 14.1_x64_4.4.0.118 and 2014_4.4.0.134_x64 Intel OpenCL SDK.
All of the other code appears to be working on the nVidia device. I can compile the kernel, create contexts, buffers, etc, but this one constructor seems to fail. I checked what the max sizes allowed for an Image3D were using cl::Device::getInfo, and it reports that HxWxD limits are 4096x4096x4096, so I'm well below the limit with my 16x16x1024 image size.
I also checked to make sure the CL_R and CL_FLOAT types were supported formats, which they appear to be.
At first I thought it was failing because of trying to copy the host memory, but the segfault is occurring before I even enqueue the image read.
The best I've been able to determine from my gdb back trace is that the problem appears to be in line 4074 of CL/cl.hpp:
#0 0x000000000000 in ?? ()
#1 0x00000000004274fe in cl::Image3D::Image3D (this=0x7fffffffffdcb0, context=...,
flags=140737488345384, format=..., width=0, height=140737488345392, depth=1024, row_pitch=0,
slice_pitch=0, host_ptr=0x0, err=0x0) at /usr/include/CL/cl.hpp:4074
#2 0x0000000000421986 in clCorrelationMatrixGenerator::initializeOpenCL (
this=0x7fffffffffdfa8) at ./libs/matrix_generator/OpenCLMatrixGenerator.cc:194
As you can see, the width and height arguments to Image3D's constructor look wonky, but I'm not sure those are the real values and not optimized out values due to the compiler.
My questions are thus:
Is there something I'm doing wrong with regards to nVidia cards, that doesn't apply on the Intel CPU OpenCL driver? Is there a known binary incompatibility between the Intel SDK and the nVidia OpenCL ICD?