5

I want to get a function that will set the n last bits of a numerical type to 1. For example:

bitmask (5) = 0b11111 = 31
bitmask (0) = 0

I, first, had this implementation (mask_t is just a typedef around uint64_t):

mask_t bitmask (unsigned short n) {
  return ((((mask_t) 1) << n) - 1;
}

Everything is fine except when the function hit bitmask (64) (the size of mask_t), then I get bitmask (64) = 0 in place of 64 bits set to 1.

So, I have two questions:

  1. Why do I have this behavior ? Pushing the 1 by 64 shifts on the left should clear the register and remain with 0, then applying the -1 should fill the register with 1s...

  2. What is the proper way to achieve this function ?

perror
  • 7,071
  • 16
  • 58
  • 85
  • 3
    You *do* remember that bit-numbers are zero-based? So for a 64-bit integer the bits are numbers `0` through `63` (inclusive). That means bit number 64 is invalid. Perhaps you should add some checks for valid bit numbers? – Some programmer dude Jul 27 '17 at 13:36
  • 1
    Shifting left by 64 bits is undefined behaviour. You may need to special case it with `~0`. – Jonathan Leffler Jul 27 '17 at 13:43
  • Why don't you just make 64 bits a special case? Check for `n == 64` and return the result you want... – lurker Jul 27 '17 at 13:44
  • 1
    How about `(uint64_t)(~0) >> (64 - n)`? – r3mainer Jul 27 '17 at 13:46
  • You're just getting lucky. Leftshift 64 bits has undefined behaviour, which means the behaviour you see/expect is not guaranteed between compilers. – Peter Jul 27 '17 at 13:46
  • Assuming there is a `shl` set of processor instructions, the operand of these instructions that has the number to shift may have requrements including that it is less than the register width. So the _processor_ may do, or do not, _anything_ when this condition is not met. So this is UB of the processor. – Paul Ogilvie Jul 27 '17 at 13:48
  • @squeamishossifrage : Your solution is nice but do not work on `0` (I just tried...). – perror Jul 27 '17 at 13:49
  • @perror Ah, yes that would be a problem ... – r3mainer Jul 27 '17 at 13:49
  • @squeamishossifrage If I'm not mistaken, wouldn't that arithmetic shift not change the value since the sign bit is 1? – Miket25 Jul 27 '17 at 14:09
  • 2
    @Miket25 `uint64_t` is unsigned. There is no sign bit. – r3mainer Jul 27 '17 at 14:09
  • 1
    @squeamishossifrage With `(uint64_t)(~0)`, cast is too late. --> `~((uint64_t)0)`. – chux - Reinstate Monica Jul 27 '17 at 15:50

3 Answers3

6

Yes this is a well known problem. There are easy ways to implement this function over the range 0..63 and over the range 1..64 (one way has been mentioned in the comments), but 0..64 is more difficult.

Of course you can just take either the "left shifting" or "right shifting" mask generation and then special-case the "missing" n,

uint64_t bitmask (unsigned short n) {
  if (n == 64) return -((uint64_t)1);
  return (((uint64_t) 1) << n) - 1;
}

Or

uint64_t bitmask (unsigned short n) {
  if (n == 0) return 0;
  uint64_t full = ~(uint64_t)0;
  return full >> (64 - n);
}

Either way tends to compile to a branch, though it technically doesn't have to.

You can do it without if (not tested)

uint64_t bitmask (unsigned int n) {
  uint64_t x = (n ^ 64) >> 6;
  return (x << (n & 63)) - 1;
}

The idea here is that we're going to either shift 1 left by some amount the same as in your original code, or 0 in the case that n = 64. Shifting 0 left by 0 is just going to be 0 again, subtracting 1 sets all 64 bits.


Alternatively if you're on a modern x64 platform and BZHI is available, a very fast (BZHI is fast on all CPUs that implement it) but limited-portability option is:

uint64_t bitmask (unsigned int n) {
  return _bzhi_u64(~(uint64_t)0, n);
}

This is even well-defined for n > 64, the actual count of 1's will be min(n & 0xFF, 64) because BZHI saturates but it reads only the lowest byte of the index.

harold
  • 61,398
  • 6
  • 86
  • 164
4

You cannot left shift by a value larger than or equal to the bit width of the type in question. Doing so invokes undefined behavior.

From section 6.5.7 of the C standard:

2 The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

You'll need to add a check for this in your code:

mask_t bitmask (unsigned short n) {
    if (n >= 64) {
        return ~(mask_t)0;
    } else {
        return (((mask_t) 1) << n) - 1;
    }
}
dbush
  • 205,898
  • 23
  • 218
  • 273
1

Finally, just for your information, I ended up by writing:

mask_t bitmask (unsigned short n) {
  return  (n < (sizeof (mask_t) * CHAR_BIT)) ? (((mask_t) 1) << n) - 1 : -1;
}

But, the answer of harold is so complete and well explained that I will select it as the answer.

perror
  • 7,071
  • 16
  • 58
  • 85