Nppi Color Conversion Issue

Question

I'm attempting to convert a frame of 3 channel packed rgb to nv12 using Nvidia's npp library. Here is the code I have so far:

//cpu buffer that will hold converted data
Npp8u* converted_data = (Npp8u*)malloc(frameToWrite.getWidth());
memset(converted_data, 0, frameToWrite.getSize());

//Begin - load data and convert rgb to yuv
{
    NppStatus ret = NPP_SUCCESS;
    int stepSource;
    Npp8u* frame = nppiMalloc_8u_C3(frameToWrite.getWidth(), frameToWrite.getHeight(), &stepSource);
    cudaMemcpy2D(frame, stepSource, frameToWrite.getFrame(), frameToWrite.getSizePerRow(), frameToWrite.getWidth(), frameToWrite.getHeight(), cudaMemcpyHostToDevice);

    int stepDestP1, stepDestP2, stepDestP3;
    Npp8u* m_stYuvP1 = nppiMalloc_8u_C1(frameToWrite.getWidth(), frameToWrite.getHeight(), &stepDestP1);
    Npp8u* m_stYuvP2 = nppiMalloc_8u_C1(frameToWrite.getWidth(), frameToWrite.getHeight(), &stepDestP2);
            Npp8u* m_stYuvP3 = nppiMalloc_8u_C1(frameToWrite.getWidth(), frameToWrite.getHeight(), &stepDestP3);
    int d_steps[3] = { stepDestP1, stepDestP2, stepDestP3 };
    Npp8u* d_ptrs[3] = { m_stYuvP1, m_stYuvP2, m_stYuvP3 };

    NppiSize ROI = { frameToWrite.getWidth(), frameToWrite.getHeight() };

    if ((ret = nppiRGBToYUV_8u_C3P3R(frame, stepSource, d_ptrs, stepDestP1, ROI)) != NPP_SUCCESS)
        return ERROR_CODE_NVENC_ERROR_UNKNOWN;

    cudaMemcpy2D(converted_data, frameToWrite.getWidth(), m_stYuvP1, stepDestP1, frameToWrite.getWidth(), frameToWrite.getHeight(), cudaMemcpyDeviceToHost);
}

Its mostly based off of this stack overflow question, but I adjusted it to fit my case. As a side note, frameToWrite.getSize() is calculated like this:

mFrameSize = ((getBytesPerPixel() * mWidth) + mPaddingInBytes) * mHeight;

where getBytesPerPixel() usually returns 3.

Ultimately my questions are:

How should I go about retrieving the converted image data from device memory?
Did I pass the unconverted image data to the device in the correct manner?

Shilghter · Accepted Answer · 2015-07-29T17:37:51.350

Npp8u* converted_data = (Npp8u*)malloc(frameToWrite.getWidth());
memset(converted_data, 0, frameToWrite.getSize());

First of all if you haven't noticed already, you are probably allocating to little memory here and then you use memset on much larger area which may cause unwanted behaviour.

As to your questions:
It is diffucult to say what do your frameToWrite.getWidth() and frameToWrite.getHeight() return - is it image dimentions or byte dimentions? Generally when you are allocating NPP buffers you should use byte dimentions like so:
nppiMalloc_8u_C1(pixelWidth*bytesPerPixel, pixelHeight, &stepSource);
Additionally, step size should be equal to line length in bytes plus padding in accordance with NPP documentation point 4.2.1.
As to retrieving image from memory, from my personal experience easiest way to do it is simply using cudaMemcpy since npp allocates 2D memory with only virtual split while original data is still alined, wherefore sinle 1D cudaMemcpy call is enough to get data back.

`frameToWrite.getHeight` and `frameToWrite.getWidth` both return pixel dimensions. Would a call like `nppiMalloc8u_C3` allocate a buffer with enough space for pixel with 3 channels (or bytes) each? — Declan Kelly, Jul 29 '15 at 22:34

Nppi Color Conversion Issue

1 Answers1