I've read these two pages: Understanding Streaming Multiprocessors (SM) and Streaming Processors (SP), How Concurrent blocks can run a single GPU streaming multiprocessor? But I am still confusing about the hardware structure.
- Is SM a SIMT(single instruction multi thread) structure?
Suppose there are 8 SPs in a given SM. If different blocks can be executed in a same SM, these SPs will have different instructions. So my understanding is: SM will give different SP different instruction.
- Are the threads in a same warp executed simultaneously?
Suppose there are 8 SPs in a given SM. A warp is in the SM. Since several warps may run in the SM, I suppose 4 SPs are running this warp. There are 32 threads in this warp, but only 4 SPs can run them. So it will actually take 8 cycles to run this warp? I also heard someone said that all the threads in a warp run serially. I don't know what is the truth...