I am defining a struct on the device side. Will it have the same size on GPU and CPU?
HOST SIDE:
typedef struct {
cl_float4* ptr;
} my_struct_cpu;
DEVICE SIDE:
typedef struct {
float4* ptr;
} my_struct_gpu;
typedef struct {
cl_float4* ptr;
} my_struct_cpu;
In this case the host pointer will be sizeof(int*) == sizeof(cl_float4*), which will map to either 4 or 8 bytes, depending on if your application is compiled in 32bit or 64bit. Most likely to be 64-bit if you are on a non-mobile device and maybe 32bit or 64bit on a Smartphone, although this is changing.
typedef struct {
float4* ptr;
} my_struct_gpu;
sizeof(int*) == sizeof(float4*) on the device, which again can be 32bit or 64bit, however, for OpenCL 1.x there is no requirement that the host pointer and device pointer need be the same size and in fact it is very common that the GPU will be 32bit pointers.h For OpenCL 2.x this may not be the case. In particular OpenCL 2.0 introduced Shared Virtual Memory between the CPU and GPU, as an extension, and if this is supported by a given OpenCL platform, then the host and device pointers will indeed be the same size. Moreover, if full fine grain SVM is supported, this is an additional extension, then it is possible to pass to pass host pointers directly to the device, e.g. inside other data-structures.
Can you give example of how will you use it?
Such structure is basically nonsense for usage on both sides, because you can't pass pointer to one cl_mem in another cl_mem.
There is no guarantee, that the size of pointers on a device will be the same as the size on the host. Hence, there is also no guarantee that the structs will have the same size.
It is possible to request the used size for pointers of a specific device during runtime by using clGetDeviceInfo
and CL_DEVICE_ADDRESS_BITS
as the param_name. The OpenCL specifications say the following about the returned value.
The default compute device address space size specified as an unsigned integer value in bits. Currently supported values are 32 or 64 bits.
The machine I tested this on is using pointers with a size of 64 bits, while 32 bits were returned for the GPU. So in that case the structs would be of different size.
Host side:
typedef struct {
cl_float4* ptr;
} my_struct_cpu;
float * SerializeForPcieSend(my_struct_cpu [] p){...}
sending to gpu:
array_of_floats_widthx4 -----> enqueuewritebuffer
gpu side: receives and builds cl-side structs:
__kernel void BuildStructs(__global float * structArr, __global my_struct_gpu * structs_in_gpu)
{
.... copies, computes, builds
}
then gpu computes on this new data
__kernel void BuildStructs(__global my_struct_gpu * structs_in_gpu, __global float * responseArray)
{
.... computes, extracts elements and puts in response array
}
host side: then cpu takes the results
clenqueuereadbuffer
array_of_floats_widthx4 <------------- response array
dont forget to double check size of buffer reads and writes with sizeof(cl_float)*num_elements.
Then re-build host-side objects
my_struct_cpu * DeserializeAfterPcieDownload(float * p){...}
Short answer: no, not always. But if you put bigger elements to upper side of struct and the endianness is same, then there should not be a problem for some gpus. If there are float3 tyeps then you should not send them directly.