4

I've built a program using Hybridizer to write CUDA code in C# and call the functions. The program is functional but I noticed that the overhead of setting up the GPU and calling the function to it is extremely high. For example, a job which took 3000 ticks when run on the CPU took about 50 million ticks to set up the GPU wrapper then another 50 million ticks to run when doing it on the GPU. I'm trying to figure out if this lag is due to Hybridizer itself or is simply unavoidable when calling GPU code from my C# program.

So I'm looking for alternative methods. My searches have found some mentions of something called P/invoke, but I can't really find a good guide on how to use it and all of those threads are 9+ years old so I don't know if their information is still relevant. I also found something about ManagedCuda but it seems that is no longer in development.

Caleb Johnson
  • 376
  • 3
  • 21

1 Answers1

5

You can try CppSharp to generate C# bindings to CUDA. We were able to initialize CUDA with this approach and call it's simple hardware info functions (GetDeviceProperties, CudaSetDevice, CudaGetDeviceCount, CudaDriverGetVersion, CudaRuntimeGetVersion).

Usage of the other parts of CUDA API seems to be possible but we did not try: CppSharp generated bindings for the whole CUDA runtime API. We use CUDA indirectly via NVIDIA's Flex library. All the Flex functions are usable via CppSharp without considerable penalties.

The example usage of classes generated via CppSharp looks like this:

int driverVersion = 0;
CudaRuntimeApi.CudaDriverGetVersion(ref driverVersion);

int runtimeVersion = 0;
CudaRuntimeApi.CudaRuntimeGetVersion(ref runtimeVersion);

int deviceCount = 0;
var errorCode = CudaRuntimeApi.CudaGetDeviceCount(ref deviceCount);

if (errorCode != CudaError.CudaSuccess)
{
    Console.Error.WriteLine("'cudaGetDeviceCount' returned " + errorCode + ": " + CudaRuntimeApi.CudaGetErrorString(errorCode));
    return;
}

for (var device = 0; device < deviceCount; ++device)
{
    using (var deviceProperties = new CudaDeviceProp()) 
    {
        CudaRuntimeApi.CudaGetDeviceProperties(deviceProperties, device);
    }
}
         

CudaRuntimeApi and CudaDeviceProp are the classes generated by CppSharp.

Denis Gladkiy
  • 2,084
  • 1
  • 26
  • 40
  • 5
    why not provide an example? – Robert Crovella Jun 24 '20 at 04:28
  • @Robert Crovella, CppSharp is a tool for bindings generation. Are you asking for scripts invoking it? Or examples of generated code? The code initializing CUDA is textually the same as in C++. – Denis Gladkiy Jun 25 '20 at 02:58
  • 2
    The C# part. You could simply demonstrate how to run a sample code like `deviceQuery` from C#. The CUDA code used as an example isn't that important, but it would be nice to see something complete, that works. I provide lots of fully worked examples in my answers, even ones that include things like OpenMP and calling CUDA code from python. Nobody charges you by the word or character to post here, so extreme brevity isn't really an attractive feature in an SO answer, in my opinion. [Here](https://stackoverflow.com/questions/45515526/) is an example of calling CUDA from python using ctypes. – Robert Crovella Jun 25 '20 at 03:19
  • Do as you wish, of course. Use your judgment. If I knew precisely what it takes to create a great answer to demonstrate the use of CppSharp to wrap CUDA code, we probably wouldn't be having this dialog. – Robert Crovella Jun 25 '20 at 03:21
  • 2
    @Robert Crovella, oh, I see. I'll try to post the code. – Denis Gladkiy Jun 25 '20 at 03:21
  • 1
    On SO, I think it's pretty universally agreed that we like code. – Robert Crovella Jun 25 '20 at 03:22