1

I have a question related with the blocks execution on the SMx. I´ve performed a experiment where several kernels are launched from different MPI processes in a GPU K20c. The GPU is shared for the MPI processes by CUDA MPS. According to the MPS documentation (https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf), a stream is associated to each MPI process, so that each MPI process can concurrently execute its kernel with other kernel belonging to other MPI process. For understanding this behaviour, I´ve visualized this experiment with Visual Profiler tool. Visual Profiler show me that some kernels are being executed concurrently, but at the same time. That is to say, both kernels are absolutely overlapped between them (Not only a small part of them). It seems as if the blocks belonging to both kernels are sharing the SMx at the same time. As far as I know, a SMx can only have blocks belonging to the same kernel. Do you have any idea why this is happening?. Thank you so much.

EDIT:

Thanks for your response @RobertCrovella. I´ve taken a look at the slides which you suggested me. According to the slides, each SMx has 4 warps schedulers and that warps can come from either different threadblocks or different concurrent kernels, Ok. I understand that when the expression "different concurrent kernels" is named, it refers to different kernels launched on different streams (concurrently). Therefore, I think that the warps belonging to different blocks from different kernels could be scheduled by the 4 warps schedulers belonging to a SMx. However, I´ve only observed this behaviour when my application is launched with CUDA MPS. When my application is only launched with streams, only a small part of the kernels (the end of a kernel with the start of other kernel) is overlapped. This is normal as far as I know, but the other behaviour is strange for me.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • 1
    "As far as I know, a SMx can only have blocks belonging to the same kernel. " Do you have any reason to believe this? Or any documentation that supports that? Because I don't believe it is correct. You may want to refer to [the answer here](http://stackoverflow.com/questions/24424206/cuda-concurrent-unique-kernels-on-the-same-multiprocessor) which includes a comment link to an NVIDIA-published presentation which indicates that blocks from multiple kernels can be resident on an SM. Unless you can support your statement, I'm inclined to mark this question as a duplicate of that one. – Robert Crovella Jun 09 '15 at 14:41
  • 1
    Take a look at slide 19 [here](http://on-demand.gputechconf.com/gtc/2013/presentations/S3466-Programming-Guidelines-GPU-Architecture.pdf) which states: "Warps can come from different threadblocks **and different concurent kernels**" – Robert Crovella Jun 09 '15 at 14:42
  • I believe the answer to this question, i.e. the explanation for the overlap behavior you see with MPS, is that your original concept ("As far as I know, a SMx can only have blocks belonging to the same kernel.") was wrong. If you have some other scenario where you don't see overlap but you expect to, I suggest asking a new question and providing a complete [MCVE](http://stackoverflow.com/help/mcve), i.e. a simplified *but complete* case that someone else could test. – Robert Crovella Jun 10 '15 at 14:29

0 Answers0