8

What is the most portable way to read and write the highest bit of an integer in C?

This is a Bloomberg interview question. I didn’t give best answer at that time. Can anyone please answer it?

ib.
  • 27,830
  • 11
  • 80
  • 100
Josh Morrison
  • 7,488
  • 25
  • 67
  • 86

5 Answers5

6

If the type is unsigned, it's easy:

(type)-1-(type)-1/2

For signed values, I know no way. If you find a way, it would answer several unanswered questions on SO:

C question: off_t (and other signed integer types) minimum and maximum values

Is there any way to compute the width of an integer type at compile-time?

Maybe others.

Community
  • 1
  • 1
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
5

First, note that there's no portable way to access the top bit if we're talking about signed integers; there's simply no single portable representation defined in the standard, so the meaning of 'top bit' can in principle vary. Additionally, C does not allow direct access to the bitwise representation; you can access the int as a char buffer, but you have no idea where the 'top bit' is located.

If we're only concerned with the non-negative range of a signed integer, and assuming said range has a size that is a power of two (if not, then we need to care about the signed representation again):

#define INT_MAX_BIT (INT_MAX - (INT_MAX >> 1))
#define SET_MAX_BIT(x) (x | INT_MAX_BIT)
#define CLEAR_MAX_BIT(x) (x & ~INT_MAX_BIT)

A similar approach can be used with unsigned ints, where it can be used to get the true top bit.

bdonlan
  • 224,562
  • 31
  • 268
  • 324
  • This only works for integer types that have limits specifies in `limits.h`. For instance, it does not work for `off_t`. – R.. GitHub STOP HELPING ICE Jan 26 '11 at 03:52
  • @R, true, but it's hard to see an efficient, portable method that doesn't use a `_MAX` macro and doesn't invoke undefined (or implementation-defined) behavior... – bdonlan Jan 26 '11 at 03:56
  • i'm sorry,can you explain it for me why "#define INT_MAX_BIT (INT_MAX - (INT_MAX >> 1))" works? I didn't get it – Josh Morrison Jan 26 '11 at 04:00
  • Typically `INT_MAX` has a value that has all 1s set (assuming INT_MAX is 2^n - 1 for some n). Shifting right one results in the top bit being cleared. Then subtracting from the original leaves only the top bit set. – bdonlan Jan 26 '11 at 04:08
  • 1
    The standard actually puts quite tight restrictions on the representation of integer types - for example, the maximum values of both signed and unsigned types must be `2**N - 1`. See section 6.2.6.2. – caf Jan 26 '11 at 08:58
  • Ah, I see; and it actually is restricted to one of three signed integer representations. Although there's nothing that says the sign bit must be the "top" bit :) – bdonlan Jan 26 '11 at 09:06
2

Here's a silly one, using:

Built-in Function: int __builtin_clz (unsigned int x)

Returns the number of leading 0-bits in x, starting at the most
significant bit position. If x is 0, the result is undefined. 

First attempt:

int get_msb(int x) { return x ? __buildin_clz(x) == 0 : 0; }

Note: it's a quirk of C that functions specifying int or unsigned int parameters can be called with the other type without warning. But, this probably involves a conversion - the C++ Standard 4.7.2 says:

If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [Note: In a two's complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). ]

Which implies that the bit pattern may be changed if it's not a two's complement representation, which would stop this "solution" working reliably too. :-(

Chris's comment below provides a solution (incorporated here as a function rather than preprocessor macro):

int get_msb(int x) { return x ? __buildin_clz(*(unsigned*)&x) == 0 : 0; }
Tony Delroy
  • 102,968
  • 15
  • 177
  • 252
  • The workaround would be `#define msb(x) __builtin_clz(*(unsigned)&x)` but then you can't use it on literal numbers. The GCC workaround would be `#define msb(x) ({ typeof(x) _x = x; __builtin_clz(*(unsigned)&_x); })` – Chris Lutz Jan 26 '11 at 05:12
  • @Chris: that's neat... in the function above the `int x` parameter would receive literals anyway, and you can then do the cast as you suggest. I'll update the code above. Thanks! – Tony Delroy Jan 26 '11 at 05:33
  • The cast is technically undefined behavior, which means it's not guaranteed to work. In practice, it will reinterpret the address of a signed value as an unsigned value (which is rightly undefined behavior since the standard doesn't specify a particular sign representation) much the same way as `union { int i; unsigned u; } u; u.i = x; return __builtin_clz(u.j);` would. I used a macro so that it would work on both signed and unsigned `int` types, invoking UB only for the signed version. But any way you do it will end up being necessarily platform dependent. – Chris Lutz Jan 26 '11 at 05:45
  • @Chris: yes, I knew that, but wouldn't be surprised if GCC itself doesn't work on any systems that quirky... :-). – Tony Delroy Jan 26 '11 at 05:51
1

What's wrong with this one?

int get_msb(int n){
    return ((unsigned)n) >> (sizeof(unsigned) * CHAR_BIT - 1);
    // or, optionally
    return n < 0;
};

int set_msb(int n, int msb){
    if (msb)
         return ((unsigned)n) |  (1ULL << (sizeof(unsigned) * CHAR_BIT - 1));
    else return ((unsigned)n) & ~(1ULL << (sizeof(unsigned) * CHAR_BIT - 1));
};

It takes care of endianness, number of bits in a byte, and works also on 1's complement.

ruslik
  • 14,714
  • 1
  • 39
  • 40
0
#define HIGH_BIT(inttype) (((inttype)1) << (CHAR_BIT * sizeof(inttype) - 1))

example usage:

ptrdiff_t i = 4711;
i |=  HIGH_BIT(ptrdiff_t);  /* set high bit */
i &= ~HIGH_BIT(ptrdiff_t);  /* clear high bit */
rapm
  • 1
  • 1