cpp rgb to yuv422 conversion

Question

I'm trying to convert an image (originally from QImage) in a RGB/RGBA format (can be changed) to a YUV422 format. My initial intention was to use OpenCV cvtColor to do the work but it does not enable the conversion of RGB/RGBA to 422 format.

I searched for alternatives and even considered to write my own conversion according to this but it would not work fast enough.

I searched for another library to use and found this post but it is relay old and not so relevant.

So my question is what good options do I have for RGB->YUV422 conversions? It would be better if they perform conversions on the GPU instead of the CPU.

Thanks in advance

[OpenCV](https://github.com/opencv/opencv) project also also has a [cuda conversion part](https://github.com/opencv/opencv/blob/master/modules/cudacodec/src/cuda/nv12_to_rgb.cu). I think this would be the best way to go since it's perform on GPU. — HMD, Apr 22 '18 at 09:44
You can probably do quite well just taking inspiration from the existing `cvtColor` implementation. It's a fair sized bite to swallow tho. In general it first tries to use OpenCL if available and implemented for given conversion, then tries a HAL version (for very few specific things like Tegra) if available and implemented, then it may try IPP if avaiable and implemented for given conversion, and finally a baseline implementation, which for YUV conversions seems to use `cv::ParallelLoopImpl` with `cv::parallel_for_`.. — Dan Mašek, Apr 23 '18 at 00:33
[Current state](https://pastebin.com/1n0GvbTR) of me playing around with implementing this colour conversion. I haven't timed it yet, but it's based off the existing OpenCV code (baseline). The results look quite reasonable. I'll play with it more tomorrow. Haven't looked at the Cuda stuff yet. | This might make a nice patch to OpenCV... it's obviously missing. — Dan Mašek, Apr 23 '18 at 00:37
[Adding some timing](https://pastebin.com/PArnVcNX), that produces [this output](https://pastebin.com/1k1aXdQm) on my i4930k with NVIdia GTX 760. Using 12 threads (this is what OpenCV will do by default) I get about 7ms to convert a 256 x 65536 BGR image. Is that fast enough for you? I'm still trying to grok the OpenCL implementation. The CUDA version of `cvtColor` doesn't seem to support YUV 4:2:2 in either direction. — Dan Mašek, Apr 23 '18 at 22:59
this should be fast enough. I will test it as soon as I can and get back to you. Thanks — Avner Gidron, May 01 '18 at 09:00
@AvnerGidron Great. I haven't looked at any of the other implementations since I wasn't getting any response, but I'll renew my efforts. Related [OpenCV issue](https://github.com/opencv/opencv/issues/9587). — Dan Mašek, May 03 '18 at 02:38
@DanMašek I finally managed to make it work but it seems that the conversion is not totally correct. I can see resemblance to original image but it is not the original. have you tested it with real images? — Avner Gidron, May 14 '18 at 14:32

score 3 · Answer 1 · answered Apr 14 '21 at 18:04

A simple implementation for OpenCV:

void rgb_to_yuv422_uyvy(const cv::Mat& rgb, cv::Mat& yuv) {
    assert(rgb.size() == yuv.size() &&
           rgb.depth() == CV_8U &&
           rgb.channels() == 3 &&
           yuv.depth() == CV_8U &&
           yuv.channels() == 2);
    for (int ih = 0; ih < rgb.rows; ih++) {
        const uint8_t* rgbRowPtr = rgb.ptr<uint8_t>(ih);
        uint8_t* yuvRowPtr = yuv.ptr<uint8_t>(ih);

        for (int iw = 0; iw < rgb.cols; iw = iw + 2) {
            const int rgbColIdxBytes = iw * rgb.elemSize();
            const int yuvColIdxBytes = iw * yuv.elemSize();

            const uint8_t R1 = rgbRowPtr[rgbColIdxBytes + 0];
            const uint8_t G1 = rgbRowPtr[rgbColIdxBytes + 1];
            const uint8_t B1 = rgbRowPtr[rgbColIdxBytes + 2];
            const uint8_t R2 = rgbRowPtr[rgbColIdxBytes + 3];
            const uint8_t G2 = rgbRowPtr[rgbColIdxBytes + 4];
            const uint8_t B2 = rgbRowPtr[rgbColIdxBytes + 5];

            const int Y  =  (0.257f * R1) + (0.504f * G1) + (0.098f * B1) + 16.0f ;
            const int U  = -(0.148f * R1) - (0.291f * G1) + (0.439f * B1) + 128.0f;
            const int V  =  (0.439f * R1) - (0.368f * G1) - (0.071f * B1) + 128.0f;
            const int Y2 =  (0.257f * R2) + (0.504f * G2) + (0.098f * B2) + 16.0f ;

            yuvRowPtr[yuvColIdxBytes + 0] = cv::saturate_cast<uint8_t>(U );
            yuvRowPtr[yuvColIdxBytes + 1] = cv::saturate_cast<uint8_t>(Y );
            yuvRowPtr[yuvColIdxBytes + 2] = cv::saturate_cast<uint8_t>(V );
            yuvRowPtr[yuvColIdxBytes + 3] = cv::saturate_cast<uint8_t>(Y2);
        }
    }
}

Note this assumes (and checks) RGB as well as YUV422 UYVY flavor. I found this to be quite fast, but obviously it's embarrassingly parallel.

score 0 · Answer 2 · answered Apr 22 '18 at 09:13

0

In this somewhat related answer, they suggest to use Intel Performance Primitives and the OP seemed to achieve the expected results (conversion of many PAL streams in realtime).

answered Apr 22 '18 at 09:13

Marco Pantaleoni

2,529
15
14

ipp is not free as far as I know. I would prefer to work with open source – Avner Gidron May 01 '18 at 10:47

score 0 · Accepted Answer · answered Sep 20 '18 at 06:41

I solved my problem using OpenCL, following this: Tutorial: Simple start with OpenCL and C++

I changed the conversion to be Format_ARGB32_Premultiplied to YUV422 but it can be easily changed to any format.

openclwrapper.h:

class OpenClWrapper
{
public:
    OpenClWrapper(size_t width, size_t height);
    ~OpenClWrapper();

    void RGB2YUV422(unsigned int * yuvImg, unsigned char * rgbImg);

private:
    std::vector<cl::Platform> m_all_platforms;
    std::vector<cl::Device> m_all_devices;
    cl::Platform m_default_platform;
    cl::Device m_default_device;
    cl::Context m_context;
    cl::Program::Sources m_sources;
    cl::Program m_program;
    cl::CommandQueue m_queue;
    cl::Buffer m_buffer_yuv;
    cl::Buffer m_buffer_rgb;
    std::string m_kernel_code;

    size_t m_width;
    size_t m_height;

};

openclwrapper.cpp:

#include "openclwrapper.h"
#include <iostream>
#include <sstream>

OpenClWrapper::OpenClWrapper(size_t width, size_t height) :
    m_height(height),
    m_width(width)
{
    //get all platforms (drivers)
       cl::Platform::get(&m_all_platforms);
       if(m_all_platforms.size()==0){
           std::cout<<" No platforms found. Check OpenCL installation!\n";
           exit(1);
       }
       m_default_platform=m_all_platforms[0];

       //get default device of the default platform
       m_default_platform.getDevices(CL_DEVICE_TYPE_ALL, &m_all_devices);
       if(m_all_devices.size()==0){
           std::cout<<" No devices found. Check OpenCL installation!\n";
           exit(1);
       }
       m_default_device=m_all_devices[0];


       m_context = *(new cl::Context({m_default_device}));

       std::ostringstream oss;

       oss <<
               "   void kernel RGB2YUV422(global const unsigned char rgbImg[" << m_height << "][" << m_width << "*4], global unsigned int yuvImg[" << m_height << "][" << m_width << "/2]){       \n"
               "       int x_idx = get_global_id(0);                                                                                        \n"
               "       int y_idx = get_global_id(1)*8;                                                                                      \n"
               "       int alpha1 = rgbImg[x_idx][y_idx+3];                                                                                 \n"
               "       int alpha2 = rgbImg[x_idx][y_idx+7];                                                                                 \n"
               "       unsigned char R1 = rgbImg[x_idx][y_idx+2]  * (255 / alpha1);                                                         \n"
               "       unsigned char G1 = rgbImg[x_idx][y_idx+1]  * (255 / alpha1);                                                         \n"
               "       unsigned char B1 = rgbImg[x_idx][y_idx] * (255 / alpha1);                                                            \n"
               "       unsigned char R2 = rgbImg[x_idx][y_idx+6] * (255 / alpha2);                                                          \n"
               "       unsigned char G2 = rgbImg[x_idx][y_idx+5] * (255 / alpha2);                                                          \n"
               "       unsigned char B2 = rgbImg[x_idx][y_idx+4] * (255 / alpha2);                                                          \n"

               "       unsigned char Y1 = (unsigned char)(0.299000*R1 + 0.587000*G1 + 0.114000*B1);                                         \n"
               "       unsigned char Y2 = (unsigned char)(0.299000*R2 + 0.587000*G2 + 0.114000*B2);                                         \n"
               "       unsigned char U = (unsigned char)(-0.168736*R1-0.331264*G1+0.500000*B1+128);//(0.492*(B1-Y1));                       \n"
               "       unsigned char V = (unsigned char)(0.500000*R1-0.418688*G1-0.081312*B1+128);//(0.877*(R1-Y1));                        \n"
               "       yuvImg[get_global_id(0)][get_global_id(1)] = (unsigned int)(Y2 << 24 | V << 16 | Y1 << 8 | U);                       \n"
               "   }                                                                                                                        ";

       m_kernel_code = oss.str();

       m_sources.push_back({m_kernel_code.c_str(),m_kernel_code.length()});

       m_program = *(new cl::Program(m_context,m_sources));
       if(m_program.build({m_default_device})!=CL_SUCCESS){
           std::cout<<" Error building: "<<m_program.getBuildInfo<CL_PROGRAM_BUILD_LOG>(m_default_device)<<"\n";
           exit(1);
       }


       // create buffers on the device
       m_buffer_yuv = *(new cl::Buffer(m_context,CL_MEM_READ_WRITE,sizeof(unsigned int)*(m_width*m_height/2))); //each cell is int so it is 4 times the mem nedded, but each pixel is represented by 16 bits
       m_buffer_rgb = *(new cl::Buffer(m_context,CL_MEM_READ_WRITE,sizeof(unsigned char)*(m_width*m_height*4))); // each pixel is represented by 4 bytes (alpha, RGB)

}

OpenClWrapper::~OpenClWrapper(){
    free(&m_buffer_rgb);
    free(&m_buffer_yuv);
}

void OpenClWrapper::RGB2YUV422(unsigned int * yuvImg, unsigned char * rgbImg){


    cl::CommandQueue queue(m_context,m_default_device);
       //write rgb image to the OpenCl buffer
       queue.enqueueWriteBuffer(m_buffer_rgb,CL_TRUE,0,sizeof(unsigned char)*(m_width*m_height*4),rgbImg);


       //run the kernel
       cl::Kernel kernel_yuv2rgb=cl::Kernel(m_program,"RGB2YUV422");
       kernel_yuv2rgb.setArg(0,m_buffer_rgb);
       kernel_yuv2rgb.setArg(1,m_buffer_yuv);
       queue.enqueueNDRangeKernel(kernel_yuv2rgb,cl::NullRange,cl::NDRange(m_height,(m_width/2)),cl::NullRange); //range is divided by 2 because we have width is represented in integers instead of 16bit (as needed in yuv422).
       queue.finish();

       //read result yuv Image from the device to yuv Image pointer
       queue.enqueueReadBuffer(m_buffer_yuv,CL_TRUE,0,sizeof(unsigned int)*(m_width*m_height/2),yuvImg);

}

cpp rgb to yuv422 conversion

3 Answers3

Linked