Previously, I tried NVML by using the function nvmlDeviceGetUtilizationRates(). I test it by this way, while the collection is running, I excute a DFT ( the Kernel is organised as <7,32>) on Tesla C2070 which have 14 SMs, on my thought, there were 7 blocks excuted on GPU at the same time, and the utilization should be 50%, but the API gave the result as 99%, which means GPU be used completely. Then, I read the documents of NVML, the nvmlDeviceGetUtilizationRates() can only return the pecent of time that over the past sample period during which one or more kernels was executing on the GPU. How can I get the active SM numbers while some kernel is running on GPU?
Asked
Active
Viewed 660 times
2
-
1I don't think this is trivial. It is possible to query which SM a block is running on via inline ptx, so it should be possible to track how many blocks are on a each SM at a given point in time. I suppose this wouldn't work with library calls, but would (at a performance cost) on your own kernels. – Jez May 26 '15 at 12:28
-
Thanks for your response, Jez. My purpose is try to schedule multiple tasks on a GPU, and the scheduler is independent of the CUDA programs. I glanced over the ptx_isa_3.2.pdf in CUDA Samples and tried pixjit in NVIDIA_CUDA-5.5_Samples/6_Advanced/pixjit. If I must write inline ptx in my own kernels, that is not my original intention.If it is possible to be used in my scheduler and query the SMs that may be used by other kernel simultaneously? – Loong Draw May 27 '15 at 13:08