0

According to our textbook: Fermi SM can take up to 1536 threads.

Let's say now I call a kernel like this:

kernel<<<8, 1024>>>();

If the 8 blocks are all in the same SM, there won't be enough threads since 1024*8 > 1536. If now instead I call a kernel like this:

kernel<<<8, 10>>>();

Then all the blocks can fit into the same SM. (and save resources? I don't know). So why don't we need to specify if the blocks are in the same SM?

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • Thanks for telling me. So is it true that the number of SM is manipulated by the system automatically to ensure that the number of threads each SM is within the range? – BoneyardCreeker Dec 07 '13 at 22:59
  • Roughly speaking, Fermi has a GigaThread scheduler that dispatch the blocks to the SMs and each SM has a dual warp scheduler that issues the warps. This occurs transparently to the user and the schedulers are not documented. Concerning your particular example (8 blocks, 10 threads each), I think that not necessarily all the blocks will be executed by the same SM. If, for example, each block uses too many registers, the schedulers may choose to distribute the blocks across many SMs. Perhaps, you may wish to play a bit with the CUDA occupancy calculator to familiarize with this problem. – Vitality Dec 07 '13 at 23:45

0 Answers0