NVIDIA Architecture: CUDA threads and thread blocks

Question

This is mostly from the book "Computer Architecture: A Quantitative Approach."

The book states that groups of 32 threads are grouped and executed together in what's called the thread block, but shows an example with a function call that has 256 threads per thread block, and CUDA's documentation states that you can have a maximum of 512 threads per thread block.

The function call looks like this:

int nblocks = (n+255)/256
daxpy<<<nblocks,256>>>(n,2.0,x,y)

Could somebody please explain how thread blocks are structured?

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#thread-hierarchy — Robert Crovella, May 13 '20 at 01:52
You are mixing the concepts of warp and black size. Read about CUDA execution model in the official programming guide linked above and see [this discussion](https://stackoverflow.com/a/10467342/7852589) and many other discussions easily found on the internet. Also you can have a maximum of 1024 threads per block. See the limits on [deviceQuery](https://docs.nvidia.com/cuda/cuda-samples/index.html#device-query) printout. — If_You_Say_So, May 13 '20 at 02:34

score 1 · Accepted Answer · answered May 13 '20 at 02:52

The question is a little unclear in my opinion. I will highlight a difference between thread warps and thread blocks that I find important in hopes that it helps answer whatever the true question is.

The number of threads per warp is defined by the hardware. Often, a thread warp is 32 threads wide (NVIDIA) because the SIMD unit on the GPU has exactly 32 lanes of execution, each with its own ALU (this is not always the case as far as I know; some architectures have only 16 lanes even though thread warps are 32 wide).

The size of a thread block is user defined (although, constrained by the hardware). The hardware will still execute thread code in 32-wide thread warps. Some GPU resources, such as shared memory and synchronization, cannot be shared arbitrarily between any two threads on the GPU. However, the GPU will allow threads to share a larger subset of resources if they belong to the same thread block. That's the main idea behind why thread blocks are used.

Thank you. The issue I was having was understanding the differences between thread blocks and warps. — areed, May 13 '20 at 04:38

NVIDIA Architecture: CUDA threads and thread blocks

1 Answers1