0

For OpenCL, specifically: What is the maximum size that a local float array can be?

I set up the kernel like this:

__kernel void mykern( unsigned int N, __global float* input, __global float* output, __local float* sdata )
{
    // ...
}

What is the maximum that I can set the size of sdata to be (in OpenCL)?

I did the following in C++ OpenCL:

clSetKernelArg(kf_myvred,3,(lws[0])*sizeof(cl_float),NULL);
clEnqueueNDRangeKernel(mycommandq,kf_myvred,1,NULL,work,lws,0,NULL,NULL);

If the size is too big, then clEnqueueNDRangeKernel returns an error of CL_OUT_OF_RESOURCES. But I'm not sure what the limit is.

Luis B
  • 1,684
  • 3
  • 15
  • 25
  • If I use the global_work_size as the size, then it gives me an error. But If I use the local_work_size, it gives does not give me an error. – Luis B Sep 10 '16 at 01:05
  • The C++ standard does not specify the maximum size of an array. The practical maximum size depends on your C++ implementation, and will obviously vary depending on how much memory is available at the given time. – Sam Varshavchik Sep 10 '16 at 01:09
  • @SamVarshavchik Does it specify a size that all implementations are required to support? – Barmar Sep 10 '16 at 01:11
  • In many implementations, the limit is actually the size of a stack frame. So it's a limit on the combined size of all local variables, not any single array. – Barmar Sep 10 '16 at 01:12
  • No. The C++ standard does not specify that. – Sam Varshavchik Sep 10 '16 at 01:12
  • Thanks for the responses. I'm talking about OpenCL specifically. Are you guys talking about C++ in general, or local arrays in an OpenCL kernel? I'm talking about "local arrays" in an OpenCL kernel. – Luis B Sep 10 '16 at 01:25
  • When I refer to a local array, I refer to something like this: http://stackoverflow.com/questions/17574570/create-local-array-dynamic-inside-opencl-kernel – Luis B Sep 10 '16 at 01:29

1 Answers1

3

Use clGetDeviceInfo with CL_DEVICE_LOCAL_MEM_SIZE parameter to query local memory size of your OpenCL device. Typically that is between 32 and 64 KB.

doqtor
  • 8,414
  • 2
  • 20
  • 36
  • Thanks for the prompt response. I ran this code: void print_device_local_mem_size(cl_device_id *mydevice) { size_t size; cl_ulong local_mem_size; clGetDeviceInfo(*mydevice, CL_DEVICE_LOCAL_MEM_SIZE, 0, NULL, &size); clGetDeviceInfo(*mydevice, CL_DEVICE_LOCAL_MEM_SIZE, size, &local_mem_size, NULL); printf("size: %lu ,, bytes: %lu\n", size, local_mem_size); } ;;;; And got this result: size: 8 ,, bytes: 49152 ;;;; What does this result mean? ;;;; – Luis B Sep 10 '16 at 21:29
  • It means that maximum 49152 bytes of local memory can be allocated per workgroup. – doqtor Sep 11 '16 at 05:21
  • So I only have ~49KB of local memory per workgroup? That sounds pretty low to me. Does that sound correct? – Luis B Sep 11 '16 at 12:18
  • That's about right. I have some GPUs that only offer 16KiB, so 48KiB is already quite generous. Don't forget that if you use all of this, no extra work groups can be scheduled (via SMT) on that execution unit, so performance will suffer. So being very frugal with local memory will usually help throughput. – pmdj Sep 11 '16 at 15:58