1

I'm trying to copy the NV12 NVDEC decoded buffer directly into an NV12 d3d11 texture. No luck so far. What I've managed to do is a double shot copy using 2 d3d11 textures (luma + chroma), 2 cuGraphicsMapResources, 2 cuGraphicsSubResourceGetMappedArray, 2 CUDA_MEMCPY2D and a pixel shader to merge all....no way to perform a single shot copy, and no response from NVidia forum so far.

I've found this old question facing a very similar problem, no solution there either.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
Fabio
  • 77
  • 6

2 Answers2

1

Perhaps you need something like this. This code snipped taken from FFmpeg Project (opensource), libavutil/hwcontext_cude.c file:

for (i = 0; i < FF_ARRAY_ELEMS(src->data) && src->data[i]; i++) {
    CUDA_MEMCPY2D cpy = {
        .srcMemoryType = CU_MEMORYTYPE_HOST,
        .dstMemoryType = CU_MEMORYTYPE_DEVICE,
        .srcHost       = src->data[i],
        .dstDevice     = (CUdeviceptr)dst->data[i],
        .srcPitch      = src->linesize[i],
        .dstPitch      = dst->linesize[i],
        .WidthInBytes  = FFMIN(src->linesize[i], dst->linesize[i]),
        .Height        = src->height >> (i ? priv->shift_height : 0),
    };

    ret = CHECK_CU(cu->cuMemcpy2DAsync(&cpy, hwctx->stream));
    if (ret < 0)
        goto exit;
}
the kamilz
  • 1,860
  • 1
  • 15
  • 19
  • No, this is a HOST to DEVICE copy, they are copying from system memory to gpu memory. – Fabio May 21 '20 at 17:39
  • Ok, don't mind that part. Focus on how ffmpeg iterates Y and UV component (for loop) and uses linesizes as pitch etc. I thought this may help. – the kamilz May 22 '20 at 12:51
1

Not sure how this can be done with NVidia/Cuda as I'm not familiar with. But this is how I managed to do it with Direct3D (D3D11va) that might help you to translate it to your situation:-

  1. (NV12 NDEC Device).CopySubresourceRegion(src NV12 NVDEC texture, srcSubresourceArrayIndex, dst NV12 shared texture)

(Get Shared Handle for the newly created NV12 shared texture)

  1. (Your Device).OpenSharedResource(NV12 shared handle)

(Prepare VideoProcessorInputView, VideoProcessorOutputView and Streams)

  1. (Your Device).VideoProcessorBlt(src NV12 shared handle, dst Your RGBA/BGRA Render Texture)

This process is Video Acceleration and it happens only in your GPU (no CPU/RAM involved). You should also ensure that the GPU adapter supports that.

SuRGeoNix
  • 482
  • 1
  • 3
  • 10