Better = faster.
I am asking in general, but consider a case when I have more "workers" than data -- is it better than last threads per each block will remain not used, or is it better to make last blocks per grid not used?
Better = faster.
I am asking in general, but consider a case when I have more "workers" than data -- is it better than last threads per each block will remain not used, or is it better to make last blocks per grid not used?
You should remember this fact that each 8 block runs on a SM (streaming multiprocessor). You can think of them as CPU cores. each block can run up to 1024 threads currently which are comparable to logical cores, for example the cores that current intel i series have, whether or not you use all of those threads, the rest of them will be wasted, because you are not using them and well no one else can. so for example if you have 8 SMs on your GPU you can assign 64 number of blocks but then you can't assign 1024 threads to each, because there is a limit on total number of threads per SM, for example 2048.( edited these based on the information that hubs gave)