I wanted to check if there's an idiomatic way--either as compiler intrinsic, or as a set of x86_64 SIMD instructions--by which I can extract bits from an integer, and use those bits as index into a looking up table, and concatenate outputs.
For example, if I have the lookup table with characters 'a' to 'j', and I were to extract 4 bits at a time, I can transform the number 0x7403 into the string "head". So, roughly:
uint16_t input = 0x7403;
const char *const table = "abcdefghij";
char output[5];
const mask_width = 4;
simd_magic(output, (const char *) &input, mask_width, table);
output[4] = '\0';
printf("%s\n", output); /* prints head */
Essentially, I'm looking for the implementation for simd_magic
, either as an asm
block with SIMD instructions or as compiler intrinsic.
/* For some i */
output[i + 0] = table[(input[i] >> 0) & 0xf];
output[i + 1] = table[(input[i] >> 4) & 0xf];
I can, of course, write a sequential for loop for this. But if I wanted to do this frequently and/or on a block of memory, I was wondering if I can take advantage of ILP, instead of working with a nibble at a time.