3

I need to store and apply permutations to 16-bit integers. The best solution I came up with is to store permutation as 64-bit integer where each 4 bits correspond to the new position of i-th bit, the application would look like:

int16 permute(int16 bits, int64 perm)
{
   int16 result = 0;
   for(int i = 0; i < 16; ++i)
      result |= ((bits >> i) & 1) * (1 << int( (perm >> (i*4))&0xf ));
   return result;
}

is there a faster way to do this? Thank you.

Ordev Agens
  • 151
  • 9
  • It may help if you can provide a slightly broader context. For example, do you need to perform the same permutation on many different bits (in which case you can prepare a lookup table) or apply the same permutation multiple times in sequence (in which case you can use a cycle decomposition). – Peter de Rivaz Apr 23 '17 at 20:04
  • Generally, I have a list of permutations (up to 9 factorial) and each of them is applied to a sequence of 512 integers (once per each integer). – Ordev Agens Apr 23 '17 at 20:14
  • So you have up to 512 times 9 factorial outputs from your program? – Peter de Rivaz Apr 23 '17 at 20:21
  • I think you want unsigned types here. It's implementation defined what happens when 1<<15 is assigned to `result` (although it's likely to work as one would expect). – Paul Hankin Apr 24 '17 at 00:17

1 Answers1

5

There are alternatives.

Any permutation can be handled by a Beneš network, and encoded as the masks that are the inputs to the multiplexers to apply the shuffle. This can be done reasonably efficiently in software too (not great but OK), it's just a bunch of butterfly permutations. The masks are a bit tricky to compute, but probably faster to apply than moving every bit on its own, though that depends on how many bits you're dealing with and 16 is not a lot.

Some smaller categories of shuffles can be handled by simpler (faster) networks, which you can also find on that page.

Finally in practice, on modern x86 hardware, there is the highly versatile pshufb function which can apply a permutation (but may include dupes and zeroes) to 16 bytes in (typically) a single cycle. It is slightly awkward to distribute the bits over the bytes, but once you're there it only takes a pshufb to permute and a pmovmskb to compress it back down to 16 bits.

Community
  • 1
  • 1
harold
  • 61,398
  • 6
  • 86
  • 164