D3D11VA/CUDA interoperability issue with NV12 surfaces

Question

I'm trying to build a transcoding pipeline in which video is decoded using D3D11VA, then brought to CUDA, optionally modified and/or analyzed using CUDA kernel and finally encoded using NVENC (using CUDA-NVENC interop); idea is to do everything on GPU without video frames ever hitting main memory. Some thing I was able to do:

D3D11VA decoding works (using Texture2D array with 20 surfaces in NV12 format bound to video decoder); decoder gives me an index into this array for every decoded frame
I can easily get the data out to main memory by using separate Texture2D of same dimensions and format as for decoding array but with D3D11_USAGE_STAGING and D3D11_CPU_ACCESS_READ; once decoder provided me with an index to decoder array, I just do CopySubresourceRegion from the decoder array slice to this staging texture, and then map the staging texture and read the data (I can successfully read data for Y and UV planes)
I can also register staging texture as CUDA resource (even though CUDA manual doesn't list NV12 as a supported pixel format); I can then map this resource, apply cudaGraphicsSubResourceGetMappedArray to the resource and copy data from received cudaArray into malloced CUDA memory.

So the issue is: I can only copy Y plane from cudaArray. I tried everything I could think of to get UV data from the texture somehow to no avail. Only "solution" which worked was to create yet another texture with 1.5x height of original texture in R8 format, to create two shader views into staging texture and to use a shader which just copies the data from both views into this helper texture; I could then map this texture to CUDA and copy all the data into CUDA memory.

I really dislike this solution - its ugly, bloated and involves extra useless data copy. Is there any other way to achieve this? A way to get CUDA to see all the data in NV12 texture, or alternatively to copy all the data out of NV12 texture into single R8 texture or a pair of R8/R8 or R8/R8G8 textures?

Andrey, have you managed to find a better way? Struggling with the same problem — aquila, May 08 '17 at 12:32
I did not. So far shader is the only way I know to bring video data to from d3d11 to cuda. Another way would to be drop d3d11va altogether and to use NVDEC API which can get data directly to CUDA frames. — Andrey Turkin, May 08 '17 at 13:52
Ok, I see. My problem is the way around - to bring NV12 decoded frame to D3D texture, so hoped to get a clue.. Thanks anyway! — aquila, May 08 '17 at 13:57

D3D11VA/CUDA interoperability issue with NV12 surfaces

0 Answers0

Linked