0

I wanted to check if there's an idiomatic way--either as compiler intrinsic, or as a set of x86_64 SIMD instructions--by which I can extract bits from an integer, and use those bits as index into a looking up table, and concatenate outputs.

For example, if I have the lookup table with characters 'a' to 'j', and I were to extract 4 bits at a time, I can transform the number 0x7403 into the string "head". So, roughly:

uint16_t input = 0x7403;
const char *const table = "abcdefghij";
char output[5];
const mask_width = 4;
simd_magic(output, (const char *) &input, mask_width, table);
output[4] = '\0';
printf("%s\n", output); /* prints head */

Essentially, I'm looking for the implementation for simd_magic, either as an asm block with SIMD instructions or as compiler intrinsic.

/* For some i */
output[i + 0] = table[(input[i] >> 0) & 0xf];
output[i + 1] = table[(input[i] >> 4) & 0xf];

I can, of course, write a sequential for loop for this. But if I wanted to do this frequently and/or on a block of memory, I was wondering if I can take advantage of ILP, instead of working with a nibble at a time.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Jeenu
  • 2,109
  • 2
  • 20
  • 27
  • 2
    For the look-up part: [`pshufb`](https://www.felixcloutier.com/x86/pshufb) -- assuming you don't have more than 16 entries. For extracting the nibbles: Is `mask_width` always 4, or could it be smaller (if yes, is it known at runtime)? Do you really want to extract bits from `input` starting with the highest bits? – chtz Dec 01 '22 at 17:51
  • @chtz, I think I've seen pshufb before. Extracting is only half the story; I'd like to also index into the look-up table with those extracted bits, retrieve the byte from there, write that byte into another location indexed by the same offset as the lookup. – Jeenu Dec 01 '22 at 18:16
  • 1
    Partial duplicate: [Why is masking needed before using a pshufb shuffle as a lookup table for nibbles?](https://stackoverflow.com/q/72978401) but that doesn't show how to unpack in input order. [How to convert a binary integer number to a hex string?](https://stackoverflow.com/q/53823756) does that, including using an 8-byte lookup table. But the main answer is asm, not intrinscs, and that's just one of the versions, the question isn't about that. – Peter Cordes Dec 01 '22 at 21:29
  • 1
    [How do I vectorize data\_i16\[0 to 15\]?](https://stackoverflow.com/q/61436326) shows doing this twice for each nibble and interleaving 8-bit into 16-bit results. – Peter Cordes Dec 01 '22 at 21:36
  • 1
    Are you asking about the general case, including where `mask_width = 3` or something that doesn't evenly divide 8? 2 and 4 are pretty easy, since you can mask/shift and interleave (`punpcklbw`) to get a vector with the fields in order. But that doesn't work for 3; a right shift won't put each every 3rd field at the bottom of a byte. And there's no 3-way interleave – Peter Cordes Dec 02 '22 at 08:56

0 Answers0