-1

I'm reading the HSA spec and it says the user mode application can submit their jobs into GPU queues directly without any OS interaction. I think this must because the application can talk with the GPU driver directly, therefore doesn't need to incur any OS kernel calls.

So my questions is, for a very simple example, in CUDA application, when we make a cudaMalloc(), does it incur any OS kernel calls?

Sagar Masuti
  • 1,271
  • 2
  • 11
  • 30
fyang29
  • 25
  • 1
  • 4
  • What is your understanding of an OS kernel call? Why would it matter? Oh and by the way, cudaMalloc can also allocate host memory. There has to be some notification to the OS for doing that. – Jonas Bötel Nov 29 '13 at 08:28
  • 2
    @LumpN `cudaMalloc` allocates device memory only. – Vitality Nov 29 '13 at 09:05
  • @fynang29 Try [strace](http://stackoverflow.com/questions/174942/how-to-use-strace)? – Vitality Nov 29 '13 at 09:09

1 Answers1

1

The entire premise of this question is flawed. "Submitting a job" and allocating memory are not the same thing. Even a user space process running on the host CPU which calls malloc will (most of the time) result in a kernel call as the standard library gathers or releases physical memory to its memory heap, normally via sbrk or mmap.

So yes,cudaMalloc results in an OS kernel call - if you run strace you will see the GPU driver invoking ioctl to issue commands to the GPU MMU/TLB. But so does running malloc in host code, and so, undoubtedly does running malloc on a theoretical HSA platform as well.

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • Thanks for your reply. I know malloc will result in a kernel call. My question comes from the HSA spec which says "The HSA approach to workload data flow reduces kernel mode transitions by allowing a direct connection (using user-space queues) between the TCUs and the user mode application. This is in contrast to the legacy GPU model, which relies on copying the workload, patching command buffers, and numerous transitions between user mode and kernel mode." That's where I can't understand. I believe there must be some tricks inside the driver or hardware, and I want to know what it is. – fyang29 Nov 30 '13 at 00:45