In CUDA to cover multiple blocks, and thus incerase the range of indices for arrays we do some thing like this:
Host side Code:
dim3 dimgrid(9,1)// total 9 blocks will be launched
dim3 dimBlock(16,1)// each block is having 16 threads // total no. of threads in
// the grid is thus 16 x9= 144.
Device side code
...
...
idx=blockIdx.x*blockDim.x+threadIdx.x;// idx will range from 0 to 143
a[idx]=a[idx]*a[idx];
...
...
What is the equivalent in OpenCL for acheiving the above case ?