NVidia GPU specifies that 1 warp has a fixed number of threads (32), then how are the threads in thread block split to different warps?
For 1 dimension thread block as (128, 1)
, it looks the threads in x
dimension are spit by 32 threads into different warps sequentially, but how does it work for other dimension sizes, like (16, 2)
, will the 32 threads map to 1 warp in this case?