2

I'm looking for an efficient way to bit shift left (<<) 10 bit values that are stored within a byte array using C++/Win32.

I am receiving an uncompressed 4:2:2 10 bit video stream via UDP, the data is stored within an unsigned char array due to the packaging of the bits.

The data is always sent so that groups of pixels finish on a byte boundary (in this case, 4 pixels sampled at a bit-depth of 10 use 5 bytes):

Bit Packing

The renderer I am using (Media Foundation Enhanced Video Renderer) requires that 10 bit values are placed into a 16 bit WORD with 6 padding bits to the right, whilst this is annoying I assume it's to help them ensure a 1-byte memory alignment:

10 bit representation with padding

What is an efficient way of left shifting each 10 bit value 6 times (and moving to a new array if needed)? Although I will be receiving varying lengths of data, they will always be comprised of these 40 bit blocks.

I'm sure a crude loop would suffice with some bit-masking(?) but that sounds expensive to me and I have to process 1500 packets/second, each with ~1200 bytes of payload.

Edit for clarity

Example Input:

unsigned char byteArray[5] = {0b01110101, 0b01111010, 0b00001010, 0b11111010, 0b00000110}

Desired Output:

WORD wordArray[4] = {0b0111010101000000, 0b1110100000000000, 0b1010111110000000, 0b1000000110000000}

(or the same resulting data in a byte array)

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
Aaron165
  • 59
  • 10
  • 2
    Start by getting functional and readable code. Optimize later, but only if needed. Speed profile your code to see how much of a bottleneck it is. But, that requires having some code. So, start by getting functional and readable code. Optimize later... – Gabriel Staples Mar 17 '21 at 15:57
  • Should I use a bit mask/loop implementation, I assume it would be possible if I started from the left most bit, counted a byte and 2 bits across and then left shifted 6 bits. I could then repeat the process on the new data as my new 16 bit WORD would be to my "left" and the unchanged data to the right – Aaron165 Mar 17 '21 at 15:58
  • 1
    I do not think I understand the problem. If expressed in laymen term, do you need to extract 10 bits from the stream of chars, pack into 2 bytes (padding on the lower bits) and send over, or is there more to that? – SergeyA Mar 17 '21 at 16:02
  • @SergeyA Yes that is exactly what I need to do (thankyou for putting it so simply - something I usually struggle with) – Aaron165 Mar 17 '21 at 16:06
  • How is 4:2:2 10 bits? 4+2+2 = 8 bits, no? – Gabriel Staples Mar 17 '21 at 16:10
  • Can we process it in groups of 4 pixels at a time? That would give us perfect 40 bit = 5 byte aligned chunks to work with, which would significantly make this easier. – Gabriel Staples Mar 17 '21 at 16:11
  • And yes, you should just use a loop and bit masks and shifts, which are all very fast in C and C++. – Gabriel Staples Mar 17 '21 at 16:12
  • 1
    @GabrielStaples 4:2:2 is a method of subsampling for the pixels, but 10 bit is the depth of each sample. – Aaron165 Mar 17 '21 at 16:15
  • 1
    @GabrielStaples Yes it would be possible to process the data in 4 pixels (40 bits) at a time, as my entire payload for each packet is made of these chunks (also called pGroups in the video world). – Aaron165 Mar 17 '21 at 16:17
  • Can't you use MF's Color Converter DSP: https://learn.microsoft.com/en-us/windows/win32/medfound/colorconverter or Video Processor DSP: https://learn.microsoft.com/en-us/windows/win32/medfound/video-processor-mft ? – Simon Mourier Mar 17 '21 at 16:55
  • @SimonMourier I don't believe so, that only accepts a valid input format - of which my data is currently not (their 10 bit formats still assume the 6 padding bits to make each bit sample a full WORD) – Aaron165 Mar 17 '21 at 17:03
  • 1
    If anyone could give me a starting point (maybe how to use masks etc to get the first 10 bits of a byte array) i'd be greatful! – Aaron165 Mar 17 '21 at 17:30
  • 1
    I fixed element 3 in `wordArray` in your question. You had `101011111000000` but needed `1010111110000000` (you were missing 1 zero at the end). – Gabriel Staples Mar 17 '21 at 17:51
  • 1
    In [my answer](https://stackoverflow.com/a/66678338/4561887), change that back to being wrong and you'll see it print out the actual and expected results to help you spot the bug. – Gabriel Staples Mar 17 '21 at 17:53

1 Answers1

3

This does the job:

void ProcessPGroup(const uint8_t byteArrayIn[5], uint16_t twoByteArrayOut[4])
{
    twoByteArrayOut[0] = (((uint16_t)byteArrayIn[0] & 0b11111111u) << (0 + 8)) | (((uint16_t)byteArrayIn[1] & 0b11000000u) << 0);
    twoByteArrayOut[1] = (((uint16_t)byteArrayIn[1] & 0b00111111u) << (2 + 8)) | (((uint16_t)byteArrayIn[2] & 0b11110000u) << 2);
    twoByteArrayOut[2] = (((uint16_t)byteArrayIn[2] & 0b00001111u) << (4 + 8)) | (((uint16_t)byteArrayIn[3] & 0b11111100u) << 4);
    twoByteArrayOut[3] = (((uint16_t)byteArrayIn[3] & 0b00000011u) << (6 + 8)) | (((uint16_t)byteArrayIn[4] & 0b11111111u) << 6);
}

Don't be confused by the [5] and [4] values in the function signature above. They don't do anything except tell you, the user, that that is the mandatory, expected number of elements in each array. See my answer here on this: Passing an array as an argument to a function in C. Passing an array that is shorter will result in undefined behavior and is a bug!

Full test code (download it in my eRCaGuy_hello_world repo here: cpp/process_10_bit_video_data.cpp):

test.cpp

/*

GS
17 Mar. 2021

To compile and run:
    mkdir -p bin && g++ -Wall -Wextra -Werror -ggdb -std=c++17 -o bin/test \
    test.cpp && bin/test

*/

#include <bitset>
#include <cstdint>
#include <cstdio>
#include <cstring>
#include <iostream>

// Get the number of elements in any C array
// - Usage example: [my own answer]:
//   https://arduino.stackexchange.com/questions/80236/initializing-array-of-structs/80289#80289
#define ARRAY_LEN(array) (sizeof(array)/sizeof(array[0]))

/// \brief      Process a packed video P group, which is 4 pixels of 10 bits each (exactly 5 uint8_t
///             bytes) into a uint16_t 4-element array (1 element per pixel).
/// \details    Each group of 10-bits for a pixel will be placed into a 16-bit word, with all 10
///             bits left-shifted to the far left edge, leaving 6 empty (zero) bits in the right
///             side of the word.
/// \param[in]  byteArrayIn  5 bytes of 10-bit pixel data for exactly 4 pixels; any array size < 5
///                        will result in undefined behavior! So, ensure you pass the proper array
///                        size in!
/// \param[out] twoByteArrayOut  The output array into which the 4 pixels will be packed, 10 bits per
///                        16-bit word, all 10 bits shifted to the left edge; any array size < 4
///                        will result in undefined behavior!
/// \return     None
void ProcessPGroup(const uint8_t byteArrayIn[5], uint16_t twoByteArrayOut[4])
{
    twoByteArrayOut[0] = (((uint16_t)byteArrayIn[0] & 0b11111111u) << (0 + 8)) | (((uint16_t)byteArrayIn[1] & 0b11000000u) << 0);
    twoByteArrayOut[1] = (((uint16_t)byteArrayIn[1] & 0b00111111u) << (2 + 8)) | (((uint16_t)byteArrayIn[2] & 0b11110000u) << 2);
    twoByteArrayOut[2] = (((uint16_t)byteArrayIn[2] & 0b00001111u) << (4 + 8)) | (((uint16_t)byteArrayIn[3] & 0b11111100u) << 4);
    twoByteArrayOut[3] = (((uint16_t)byteArrayIn[3] & 0b00000011u) << (6 + 8)) | (((uint16_t)byteArrayIn[4] & 0b11111111u) << 6);
}

// Reference: https://stackoverflow.com/questions/7349689/how-to-print-using-cout-a-number-in-binary-form/7349767
void PrintArrayAsBinary(const uint16_t* twoByteArray, size_t len)
{
    std::cout << "{\n";
    for (size_t i = 0; i < len; i++)
    {
        std::cout << std::bitset<16>(twoByteArray[i]);
        if (i < len - 1)
        {
            std::cout << ",";
        }
        std::cout << std::endl;
    }
    std::cout << "}\n";
}

int main()
{
    printf("Processing 10-bit video data example\n");

    constexpr uint8_t TEST_BYTE_ARRAY_INPUT[5] = {0b01110101, 0b01111010, 0b00001010, 0b11111010, 0b00000110};
    constexpr uint16_t TEST_TWO_BYTE_ARRAY_OUTPUT[4] = {
        0b0111010101000000, 0b1110100000000000, 0b1010111110000000, 0b1000000110000000};

    uint16_t twoByteArrayOut[4];
    ProcessPGroup(TEST_BYTE_ARRAY_INPUT, twoByteArrayOut);

    if (std::memcmp(twoByteArrayOut, TEST_TWO_BYTE_ARRAY_OUTPUT, sizeof(TEST_TWO_BYTE_ARRAY_OUTPUT)) == 0)
    {
        printf("TEST PASSED!\n");
    }
    else
    {
        printf("TEST ==FAILED!==\n");

        std::cout << "expected = \n";
        PrintArrayAsBinary(TEST_TWO_BYTE_ARRAY_OUTPUT, ARRAY_LEN(TEST_TWO_BYTE_ARRAY_OUTPUT));

        std::cout << "actual = \n";
        PrintArrayAsBinary(twoByteArrayOut, ARRAY_LEN(twoByteArrayOut));
    }

    return 0;
}

Sample run and output:

$ mkdir -p bin && g++ -Wall -Wextra -Werror -ggdb -std=c++17 \
-o bin/test test.cpp && bin/test
Processing 10-bit video data example
TEST PASSED!

I've now also placed this code into my eRCaGuy_hello_world repo here: cpp/process_10_bit_video_data.cpp.

References:

  1. How to print (using cout) a number in binary form?
  2. [my answer] Passing an array as an argument to a function in C
  3. [my eRCaGuy_hello_world repo] ARRAY_LEN() macro: see utilities.h
  4. https://en.cppreference.com/w/cpp/string/byte/memcmp

Keywords: c and c++ bitmasking and bit-shifting, bit-packing; bit-masking bit masking, bitshifting bit shifting, bitpacking bit packing, byte packing, lossless data compression

Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265