3

For the NVIDIA GEFORCE 940mx GPU, Device Query shows it has 3 Multiprocessor and 128 cores for each MP.enter image description here

Number of threads per multiprocessor=2048

So, 3*2048=6144.ie. total 6144 threads in GPU.

6144/1024=6 ,ie. total 6 blocks. And warp size is 32.

But from this video https://www.youtube.com/watch?v=kzXjRFL-gjo i found that each GPU has limit on threads, but no limit on Number of blocks.

So i got confused with this. I would like to know

  1. How many total threads are in my GPU? Can we use all threads for execute a program?
  2. How many blocks and Grids are there?
9113303
  • 852
  • 1
  • 16
  • 30
  • GPUs generally don't place significant limits on the total number of threads, or the total number of blocks. These are not properties of the hardware, generally speaking, but attributes of the code you write. All currently available CUDA GPUs can support at least billions of blocks and at least trillions of threads (in total). You'll need to get rid of the mindset that thinks there is a rigid connection between these ideas and GPU hardware. – Robert Crovella Jun 26 '18 at 05:29
  • 3
    The 6144 number you have calculated is the maximum instantaneous capacity of your GPU, but it has no bearing on how many blocks or threads you can launch. – Robert Crovella Jun 26 '18 at 05:35
  • so we can tell like maximum limit of threads /block etc. Not number of threads or number of blocks. – 9113303 Jun 26 '18 at 05:41
  • yes, the maximum number of threads per block is a defined hardware limit – Robert Crovella Jun 26 '18 at 06:34
  • @RobertCrovella But there should be a limit of number of threads for a task. We could not set the number of threads as our wish. What is the limit in my GPU? Is that 6144 or more. – 9113303 Jul 30 '18 at 09:16

1 Answers1

1

It appears the main source of your confusion is mixing up two completely different sets of limits:

  1. The maximum number of threads and blocks which can run concurrently on the GPU.
  2. The maximum number of threads and blocks which can be launched for a given kernel.

The numbers you quote (2048 threads per multiprocessor, three multiprocessors in total = 6144 threads represent the first set of limits. The numbers you show in your screenshot of the deviceQuery output:

  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)

define the limits of a given kernel launch. While they overlap somewhat, you can treat them as more or less separate. For a more thorough discussion of the practicalities of kernel launch parameters and block dimensions, see here.

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • But **cat /proc/sys/kernel/threads-max** gives the maximum number of threads, and i got 126906. How its possible? – 9113303 Jul 16 '18 at 12:00
  • @9113303: that number has nothing to do with your GPU, that is how it is possible. Read Appendix H of the CUDA programming guide – talonmies Jul 16 '18 at 12:05
  • <<>> defines the kernal. What should be nblock and blocksize in my case? Does that depends on our task? – 9113303 Jul 30 '18 at 09:17