I'm trying to build a transcoding pipeline in which video is decoded using D3D11VA, then brought to CUDA, optionally modified and/or analyzed using CUDA kernel and finally encoded using NVENC (using CUDA-NVENC interop); idea is to do everything on GPU without video frames ever hitting main memory. Some thing I was able to do:
- D3D11VA decoding works (using Texture2D array with 20 surfaces in NV12 format bound to video decoder); decoder gives me an index into this array for every decoded frame
- I can easily get the data out to main memory by using separate Texture2D of same dimensions and format as for decoding array but with
D3D11_USAGE_STAGING
andD3D11_CPU_ACCESS_READ
; once decoder provided me with an index to decoder array, I just doCopySubresourceRegion
from the decoder array slice to this staging texture, and then map the staging texture and read the data (I can successfully read data for Y and UV planes) - I can also register staging texture as CUDA resource (even though CUDA manual doesn't list NV12 as a supported pixel format); I can then map this resource, apply
cudaGraphicsSubResourceGetMappedArray
to the resource and copy data from receivedcudaArray
into malloced CUDA memory.
So the issue is: I can only copy Y plane from cudaArray
. I tried everything I could think of to get UV data from the texture somehow to no avail. Only "solution" which worked was to create yet another texture with 1.5x height of original texture in R8 format, to create two shader views into staging texture and to use a shader which just copies the data from both views into this helper texture; I could then map this texture to CUDA and copy all the data into CUDA memory.
I really dislike this solution - its ugly, bloated and involves extra useless data copy. Is there any other way to achieve this? A way to get CUDA to see all the data in NV12 texture, or alternatively to copy all the data out of NV12 texture into single R8 texture or a pair of R8/R8 or R8/R8G8 textures?