Size of pointer on GPU vs. size of pointer on CPU

Question

I am defining a struct on the device side. Will it have the same size on GPU and CPU?

HOST SIDE:

typedef struct {
    cl_float4* ptr;
} my_struct_cpu;

DEVICE SIDE:

typedef struct {
    float4* ptr;
} my_struct_gpu;

score 1 · Accepted Answer · answered Mar 27 '15 at 21:31

typedef struct {
    cl_float4* ptr;
} my_struct_cpu;

In this case the host pointer will be sizeof(int*) == sizeof(cl_float4*), which will map to either 4 or 8 bytes, depending on if your application is compiled in 32bit or 64bit. Most likely to be 64-bit if you are on a non-mobile device and maybe 32bit or 64bit on a Smartphone, although this is changing.

typedef struct {
    float4* ptr;
} my_struct_gpu;

sizeof(int*) == sizeof(float4*) on the device, which again can be 32bit or 64bit, however, for OpenCL 1.x there is no requirement that the host pointer and device pointer need be the same size and in fact it is very common that the GPU will be 32bit pointers.h For OpenCL 2.x this may not be the case. In particular OpenCL 2.0 introduced Shared Virtual Memory between the CPU and GPU, as an extension, and if this is supported by a given OpenCL platform, then the host and device pointers will indeed be the same size. Moreover, if full fine grain SVM is supported, this is an additional extension, then it is possible to pass to pass host pointers directly to the device, e.g. inside other data-structures.

score 0 · Answer 2 · answered Mar 27 '15 at 14:58

0

Can you give example of how will you use it?

Such structure is basically nonsense for usage on both sides, because you can't pass pointer to one cl_mem in another cl_mem.

answered Mar 27 '15 at 14:58

Roman Arzumanyan

1,784
10
10

1

So, you are right! It does not make sense because it is not useful but if I have a struct like this `typedef struct { __global float4* ptr; } my_struct_gpu;`, I want to use it in my kernel arguments like `__kernel void mykernel(__global my_struct_gpu data)`, I should allocate memory for `data` on the host side. I wanted to know how much memory should I allocate for the pointer object on the gpu. Is it the same as CPU (4bytes) or different? – mmostajab Mar 27 '15 at 16:03

bbastu · Answer 3 · 2015-03-27T21:49:17.987

There is no guarantee, that the size of pointers on a device will be the same as the size on the host. Hence, there is also no guarantee that the structs will have the same size.

It is possible to request the used size for pointers of a specific device during runtime by using clGetDeviceInfo and CL_DEVICE_ADDRESS_BITS as the param_name. The OpenCL specifications say the following about the returned value.

The default compute device address space size specified as an unsigned integer value in bits. Currently supported values are 32 or 64 bits.

The machine I tested this on is using pointers with a size of 64 bits, while 32 bits were returned for the GPU. So in that case the structs would be of different size.

Yes. The cl_float4 and float4 should have the same size but the question is whether the pointer to a cl_float has the same size of pointer to float4 on cpu and gpu? For example in C++, a pointer is taking 4 bytes, — mmostajab, Mar 27 '15 at 15:40

score 0 · Answer 4 · answered Mar 27 '15 at 20:31

Host side:

  typedef struct {
     cl_float4* ptr;
  } my_struct_cpu;

  float * SerializeForPcieSend(my_struct_cpu [] p){...}

sending to gpu:

 array_of_floats_widthx4 -----> enqueuewritebuffer

gpu side: receives and builds cl-side structs:

 __kernel void BuildStructs(__global float * structArr, __global my_struct_gpu * structs_in_gpu)
  {
         .... copies, computes, builds

  }

then gpu computes on this new data

   __kernel void BuildStructs(__global my_struct_gpu * structs_in_gpu, __global float * responseArray)
  {
         .... computes, extracts elements and puts in response array

  }

host side: then cpu takes the results

  clenqueuereadbuffer
  array_of_floats_widthx4 <------------- response array

dont forget to double check size of buffer reads and writes with sizeof(cl_float)*num_elements.

Then re-build host-side objects

    my_struct_cpu * DeserializeAfterPcieDownload(float * p){...}

Short answer: no, not always. But if you put bigger elements to upper side of struct and the endianness is same, then there should not be a problem for some gpus. If there are float3 tyeps then you should not send them directly.

Size of pointer on GPU vs. size of pointer on CPU

4 Answers4