1

I wish to move bits 0,8,16,24 of a 32-bit value to bits 0,1,2,3 respectively. All other bits in the input and output will be zero.

Obviously I can do that like this:

c = c>>21 + c>>14 + c>>7 + c;
c &= 0xF;

But is there a faster (fewer instructions) way?

phuclv
  • 37,963
  • 15
  • 156
  • 475
Dijkstra
  • 2,490
  • 3
  • 21
  • 35
  • 5
    First, you code doesn't do what you ask it to, as there are other bits in `c` that will included in the addition. Secondly, you are counting the bits the wrong way around. The rightmost (least valued) bit is numbered 0. – Lindydancer Jan 10 '12 at 11:59
  • Thanks, I've changed the order of the bits. – Dijkstra Jan 10 '12 at 14:41
  • 1
    And I've clarified the constraints, so I think the code works now :) – Dijkstra Jan 10 '12 at 14:46
  • @Djikstra, that code still doesn't do what you want it to do. You need to mask off the bits you are combining together. – MSN Jan 10 '12 at 17:19
  • @MSN, I'm sorry I don't understand where it goes wrong. Note that 28 of the bits in the input are guaranteed to be zero. – Dijkstra Jan 10 '12 at 18:24
  • 1
    @Djikstra, given an int `0xffffffff`, you will get the wrong result. – MSN Jan 10 '12 at 18:45
  • @MSN, that's ok, I assert that won't happen :) – Dijkstra Jan 10 '12 at 18:52
  • duplicate problems, but working on 8 bytes at once instead of 4: [How to create a byte out of 8 bool values (and vice versa)?](https://stackoverflow.com/q/8461126/995714), [Intel x86 assembly optimization techniques in a sample problem](https://stackoverflow.com/q/1414911/995714), [What's the fastest way to pack 32 0/1 values into the bits of a single 32-bit variable?](https://stackoverflow.com/q/26200961/995714). Your code won't work if there's a carry from the higher digit. For example 0x81818181 will produce incorrect input. You need to use a bitwise `or` instead of add – phuclv Aug 21 '18 at 15:32

3 Answers3

2
c = (((c&BITS_0_8_16_24) * BITS_0_7_14_21) >> 21) & 0xF;

Or wait for Intel Haswell processor, doing all this in exactly one instruction (pext).

Update

Taking into account clarified constraints and assuming 32-bit unsigned values, the code may be simplified to this:

c = (c * BITS_7_14_21_28) >> 28;
Evgeny Kluev
  • 24,287
  • 7
  • 55
  • 98
  • the magic number would be 0b00010000001000000100000010000000 or 0x10204080 => `(c * 0x10204080) >> 28`. I used this technique [here](https://stackoverflow.com/a/51750902/995714) – phuclv Aug 21 '18 at 15:31
1

If you don't care about portability, and can use SSE instructions, look at the PMOVMSKB instruction and its compiler intrinsic. [I noticed that your bit positions are most significant (sign) bits of the 4 bytes comprising the 32-bit word.]

zvrba
  • 24,186
  • 3
  • 55
  • 65
  • if you don't care about portability and you have new enough x86 CPU then you can also use `PEXT` in BMI2 – phuclv Aug 21 '18 at 15:34
0

Instead of writing some obfuscated one-line goo, the below code is what I would write, for maximum portability and maintainability. I would let the optimizer worry about whether or not it is the most effective code.

#include <stdint.h>
#include <limits.h>
#include <stdio.h>

#define BITS_TO_MOVE  4

static const uint32_t OLD_MASK [BITS_TO_MOVE] =
{
  0x0008u,
  0x0080u,
  0x0800u,
  0x8000u
};

static const uint32_t NEW_MASK [BITS_TO_MOVE] =
{
  0x1000u,
  0x2000u,
  0x4000u,
  0x8000u
};


int main()
{
  uint32_t  c     = 0xAAAAu;
  uint32_t  new_c = 0;
  uint8_t   i;

  printf("%.4X\n", c);


  for(i=0; i<BITS_TO_MOVE; i++)
  {
    if ( (c & OLD_MASK[i]) > 0 )
    {
      new_c |= NEW_MASK[i];
    }
  }


  printf("%.4X\n", new_c);
  getchar();

  return 0;
}
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Optimizers are smart, but not smart enough to replace bit extraction code with a single instruction. "Portability" is a moot point: you don't have to bother yourself with it unless you KNOW upfront that the code has to run on multiple CPU platforms. – zvrba Jan 10 '12 at 15:42
  • @zvrba You never reuse old code you have written in other projects? Also, the same can be said about performance, you don't have to bother with it unless you know that it is needed. I reckon the above code will be "fast enough", perhaps not a single instruction but not worse than maybe 3-4 either. Depending on CPU type of course. – Lundin Jan 10 '12 at 21:58
  • 1
    Reuse? It depends. The OP specifically asked about a faster way than his example, and you gave him something longer and probably slower. – zvrba Jan 11 '12 at 08:26