I'm currently attempting to connect some form of output from a CUDA program to a GL_TEXTURE_2D
for use in rendering. I'm not that worried about the output type from CUDA (whether it'd be an array or surface, I can adapt the program to that).
So the question is, how would I do that? (my current code copies the output array to system memory, and uploads it to the GPU again with GL.TexImage2D
, which is obviously highly inefficient - when I disable those two pieces of code, it goes from approximately 300 kernel executions per second to a whopping 400)
I already have a little bit of test code, to at least bind a GL texture to CUDA, but I'm not even able to get the device pointer from it...
ctx = CudaContext.CreateOpenGLContext(CudaContext.GetMaxGflopsDeviceId(), CUCtxFlags.SchedAuto);
uint textureID = (uint)GL.GenTexture(); //create a texture in GL
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMinFilter, (int)TextureMinFilter.Linear);
GL.TexParameter(TextureTarget.Texture2D, TextureParameterName.TextureMagFilter, (int)TextureMagFilter.Linear);
GL.TexImage2D(TextureTarget.Texture2D, 0, PixelInternalFormat.Rgba, width, height, 0, OpenTK.Graphics.OpenGL.PixelFormat.Rgba, PixelType.UnsignedByte, null); //allocate memory for the texture in GL
CudaOpenGLImageInteropResource resultImage = new CudaOpenGLImageInteropResource(textureID, CUGraphicsRegisterFlags.WriteDiscard, CudaOpenGLImageInteropResource.OpenGLImageTarget.GL_TEXTURE_2D, CUGraphicsMapResourceFlags.WriteDiscard); //using writediscard because the CUDA kernel will only write to this texture
//then, as far as I understood the ManagedCuda example, I have to do the following when I call my kernel
//(done without a CudaGraphicsInteropResourceCollection because I only have one item)
resultImage.Map();
var ptr = resultImage.GetMappedPointer(); //this crashes
kernelSample.Run(ptr); //pass the pointer to the kernel so it knows where to write
resultImage.UnMap();
The following exception is thrown when attempting to get the pointer:
ErrorNotMappedAsPointer: This indicates that a mapped resource is not available for access as a pointer.
What do I need to do to fix this?
And even if this exception can be resolved, how would I solve the other part of my question; that is, how do I work with the acquired pointer in my kernel? Can I use a surface for that? Access it as an arbitrary array (pointer arithmetic)?
Edit: Looking at this example, apparently I don't even need to map the resource every time I call the kernel, and call the render function. But how would this translate to ManangedCUDA?