I have an array of up to 32 rows (often less) of uint32_t values that are essentially just bitmaps - so a total of 1024 bits maximum. The data in these <=32 uint32_ts changes frequently (multiple times/millisecond).
For a vectorized algorithm I'm running, I need to extract some of the data from these bitmaps and rearrange the bits into the format my algorithm expects. The positions of the data I need to extract is known at runtime and will not change during the run. I want to find a way - even if it's a bit computationally heavy, as long as it's not memory-heavy I can frontload it - to compute some mapping function that will enable me to quickly extract the data into the format my algorithm expects, so I can run it as frequently as possible to keep up with the data.
The mapping looks something like this, presented in uint8_t binary format below for simplicity.
Input data (nominally up to 32 uint32_ts, presented here as 3 uint8_ts):
abcd efgh
ijkl mnop
qrst uvwx
I want to extract some of these, and create something like the following output data:
00000adu
00000fgx
Here, I'm only extracting 6 of the 24 bits. As a note, the data I extract will always:
- Be an even number of bits - I need pairs, for instance "af" and "dg" and "ux" above.
- Each "pair" will always come from either the same row (uint32_t) or column (bit position within all 32 of the uint32_ts). I will never have some data come from the same row and other data come from the same column; it's one or the other for a given configuration.
- The algorithm expects to operate on a pair of output uint32_ts. In the case where I extract more than 64 bits - and so the result won't fit in 2 uint32_ts - I'll have to expand into a pair of arrays of uint32_ts.
Obviously there's the naive approach, but it doesn't operate very quickly - I'm hoping to run this as many times a second as possible.
#define EXTRACT_PAIRS_COUNT 3
const uint8_t colExtractLeft[EXTRACT_PAIRS_COUNT] = {7, 4, 3}
const uint8_t colExtractRight[EXTRACT_PAIRS_COUNT] = {2, 1, 0}
const uint8_t rowExtract[EXTRACT_PAIRS_COUNT] = {0, 0, 2}
uint8_t input[3] = {abcdefgh, ijklmnop, qrstuvwx};
while (inputChanges) {
uint8_t outputLeft = 0;
uint8_t outputRight = 0;
for (size_t i=0; i<EXTRACT_PAIRS_COUNT; i++) {
outputLeft |= ((input[rowExtract[i]] >> colExtractLeft[i]) & 0x1) << i;
outputRight |= ((input[rowExtract[i]] >> colExtractRight[i]) & 0x1) << i;
}
runAlgorithm(outputLeft, outputRight);
}
(Note that the #defines and const variables at the top of that example are just for demonstration/clarity, in reality they're calculated at runtime)
How can I efficnetly map bits into this format?