The maximum amount of resident warp is the maximum number of warps that can be processed in parallel on the multiprocessor.
A warp is active when it is scheduled by warp scheduler and registers have been allocated.
If you achieve to have this amount of warps running in parallel, this the theoretical maximum occupancy (100%, or 1:1).
If not, the occupancy ratio is lower.
Other warps will have to wait.
Might be related to this question on SO.
Edited answer for further questions :
- Warps
About the maximum amount of warps that can be processed : the SM (streaming multi-processors) have a maximum of processing cores, and the GPU has a limited amount of SMs. Even if this webinar is not up-to-date with new architectures, it gives some good examples :
SM – Streaming multi-processors with multiple processing cores
Each SM contains 32 processing cores
Execute in a Single Instruction Multiple Thread (SIMT) fashion
Up to 16 SMs on a card for a maximum of 512compute cores
And :
Fermi can have up to 48 active warps per SM (1536 threads)
- Processing warps
First, for some terms they are not always clearly official, see for example this topic from Nvidia DevTalk.
As explained on this topic, a given warp is active once it has been allocated on the SM with its resources.
Then it can be :
- eligible : it can issue an operation
- stalled : it cannot because of a resource/data dependency
This is possible because we have a SIMT architecture there, meaning Single Instruction Multiple Threads. You will find lots of readings on this topic that can be very useful if you plan on tweaking occupancy.