Questions of resident warps of CUDA

Question

I have been using CUDA for a month, now i'm trying to make it clear that how many warps/blocks are needed to hide the latency of memory accesses. I think it is related to the maximum of resident warps on a multiprocessor.

According to Table.13 in CUDA_C_Programming_Guide (v-7.5),the maximum of resident warps per multiprocessor is 64. Then, my question is : what is the resident warp? is it refer to those warps with the data read from memory of GPUs and are ready to be processed by SPs? Or refer to either the warps that can read momory for datar or warps that are ready to be processed by SPs,which means that the rest warps except those 64 can neither read memory nor be processed by SPs untill some of those 64 resident warps are done.

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

2

The maximum amount of resident warp is the maximum number of warps that can be processed in parallel on the multiprocessor. A warp is active when it is scheduled by warp scheduler and registers have been allocated.

If you achieve to have this amount of warps running in parallel, this the theoretical maximum occupancy (100%, or 1:1). If not, the occupancy ratio is lower.

Other warps will have to wait.

Might be related to this question on SO.

Edited answer for further questions :

Warps

About the maximum amount of warps that can be processed : the SM (streaming multi-processors) have a maximum of processing cores, and the GPU has a limited amount of SMs. Even if this webinar is not up-to-date with new architectures, it gives some good examples :

SM – Streaming multi-processors with multiple processing cores

Each SM contains 32 processing cores

Execute in a Single Instruction Multiple Thread (SIMT) fashion

Up to 16 SMs on a card for a maximum of 512compute cores

And :

Fermi can have up to 48 active warps per SM (1536 threads)

Processing warps

First, for some terms they are not always clearly official, see for example this topic from Nvidia DevTalk.

As explained on this topic, a given warp is active once it has been allocated on the SM with its resources. Then it can be :

eligible : it can issue an operation
stalled : it cannot because of a resource/data dependency

This is possible because we have a SIMT architecture there, meaning Single Instruction Multiple Threads. You will find lots of readings on this topic that can be very useful if you plan on tweaking occupancy.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 12 '17 at 08:34

Taro

798
8
18

i'm sitll confused. You mentioned "The maximum amount of resident warp is the maximum number of warps that can be processed in parallel on the multiprocessor. A warp is active when it is scheduled by warp scheduler and registers have been allocated". My questions are: 1. A warp is called a resident when it is actived? – Falofter Jan 12 '17 at 12:13
2.What does the item "precessed " in "precessed by the multiprocessor" means? Does it mean that the resident warp has had its operands ready to be calculated? Or it means the resident warp is actived to either read operands or to calculate operands? What confuse me the most is: Does a resident warp measns that the resident warp doesn't need to read operands but with the operands ready before this warp is actived to be a resident warp? In other word, a warp can not be a resident warp untill it has its operands ready? @Taro – Falofter Jan 12 '17 at 12:35
I updated my answer with some clarifications. If you still don't feel like understanding everything, you should fully read every sources I gave as links as they answer (almost) everything you might wonder on the different states a warp can be in. – Taro Jan 12 '17 at 13:42
Many thanks to your help. Haven't deeply read, but the links you provide exactly relevant to my confusion. i'll go on to work it out. – Falofter Jan 12 '17 at 14:08
Glad I could help :) – Taro Jan 12 '17 at 15:06

Questions of resident warps of CUDA

1 Answers1

Linked