Determine Position of Most Signifiacntly Set Bit in a Byte

Question

I have a byte I am using to store bit flags. I need to compute the position of the most significant set bit in the byte.

Example Byte: 00101101 => 6 is the position of the most significant set bit

Compact Hex Mapping:

[0x00]      => 0x00
[0x01]      => 0x01
[0x02,0x03] => 0x02
[0x04,0x07] => 0x03
[0x08,0x0F] => 0x04
[0x10,0x1F] => 0x05
[0x20,0x3F] => 0x06
[0x40,0x7F] => 0x07
[0x80,0xFF] => 0x08

TestCase in C:

#include <stdio.h>

unsigned char check(unsigned char b) {
  unsigned char c = 0x08;
  unsigned char m = 0x80;
  do {
    if(m&b) { return  c; }
    else    { c -= 0x01; }
  } while(m>>=1);
  return 0; //never reached
}
int main() {
  unsigned char input[256] = {
    0x00,0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,
    0x10,0x11,0x12,0x13,0x14,0x15,0x16,0x17,0x18,0x19,0x1a,0x1b,0x1c,0x1d,0x1e,0x1f,
    0x20,0x21,0x22,0x23,0x24,0x25,0x26,0x27,0x28,0x29,0x2a,0x2b,0x2c,0x2d,0x2e,0x2f,
    0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x3a,0x3b,0x3c,0x3d,0x3e,0x3f,
    0x40,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x4a,0x4b,0x4c,0x4d,0x4e,0x4f,
    0x50,0x51,0x52,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5a,0x5b,0x5c,0x5d,0x5e,0x5f,
    0x60,0x61,0x62,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0x6a,0x6b,0x6c,0x6d,0x6e,0x6f,
    0x70,0x71,0x72,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7a,0x7b,0x7c,0x7d,0x7e,0x7f,
    0x80,0x81,0x82,0x83,0x84,0x85,0x86,0x87,0x88,0x89,0x8a,0x8b,0x8c,0x8d,0x8e,0x8f,
    0x90,0x91,0x92,0x93,0x94,0x95,0x96,0x97,0x98,0x99,0x9a,0x9b,0x9c,0x9d,0x9e,0x9f,
    0xa0,0xa1,0xa2,0xa3,0xa4,0xa5,0xa6,0xa7,0xa8,0xa9,0xaa,0xab,0xac,0xad,0xae,0xaf,
    0xb0,0xb1,0xb2,0xb3,0xb4,0xb5,0xb6,0xb7,0xb8,0xb9,0xba,0xbb,0xbc,0xbd,0xbe,0xbf,
    0xc0,0xc1,0xc2,0xc3,0xc4,0xc5,0xc6,0xc7,0xc8,0xc9,0xca,0xcb,0xcc,0xcd,0xce,0xcf,
    0xd0,0xd1,0xd2,0xd3,0xd4,0xd5,0xd6,0xd7,0xd8,0xd9,0xda,0xdb,0xdc,0xdd,0xde,0xdf,
    0xe0,0xe1,0xe2,0xe3,0xe4,0xe5,0xe6,0xe7,0xe8,0xe9,0xea,0xeb,0xec,0xed,0xee,0xef,
    0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7,0xf8,0xf9,0xfa,0xfb,0xfc,0xfd,0xfe,0xff };

  unsigned char truth[256] = {
    0x00,0x01,0x02,0x02,0x03,0x03,0x03,0x03,0x04,0x04,0x04,0x04,0x04,0x04,0x04,0x04, 
    0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05,0x05, 
    0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06, 
    0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06,0x06, 
    0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07, 
    0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07, 
    0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07, 
    0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07,0x07, 
    0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
    0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
    0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
    0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
    0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
    0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
    0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,
    0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08,0x08};

  int i,r;
  int f = 0;
  for(i=0; i<256; ++i) {
    r=check(input[i]);
    if(r !=(truth[i])) {
      printf("failed %d : 0x%x : %d\n",i,0x000000FF & ((int)input[i]),r);
      f += 1;
    }
  }
  if(!f) { printf("passed all\n");  }
  else   { printf("failed %d\n",f); }
  return 0;
}

I would like to simplify my check() function to not involve looping (or branching preferably). Is there a bit twiddling hack or hashed lookup table solution to compute the position of the most significant set bit in a byte?

A lookup table with 256 entries? Or one with 16 entries and two lookups? — Kerrek SB, Jan 24 '13 at 21:51
I would like a language independent answer, so no compiler specific solutions. I just happen to be using `C` as the language for demonstration. — recursion.ninja, Jan 24 '13 at 21:52
@KerrekSB Those are not **`hashed`** lookup table solutions. — recursion.ninja, Jan 24 '13 at 21:54
You are using `truth[]` to check the `check()`. But why can't `check()` be implemented with `truth[]`? — jxh, Jan 24 '13 at 21:58
I made a solution here with `__builtin_clz()` and it compiles to a single instruction for both ARM and Intel. — Carl Norum, Jan 24 '13 at 22:01
The last comment in [this answer](http://stackoverflow.com/a/671826/315052) is probably the bit hack you are looking for. — jxh, Jan 24 '13 at 22:18
@user315052 the idea is to not use a lookup table at all, and if a lookup table is a necessity, use a hashed lookup table with 9 entries not 256. — recursion.ninja, Jan 24 '13 at 23:34

score 4 · Accepted Answer · edited May 23 '17 at 10:34

4

Your question is about an efficient way to compute log2 of a value. And because you seem to want a solution that is not limited to the C language I have been slightly lazy and tweaked some C# code I have.

You want to compute log2(x) + 1 and for x = 0 (where log2 is undefined) you define the result as 0 (e.g. you create a special case where log2(0) = -1).

static readonly Byte[] multiplyDeBruijnBitPosition = new Byte[] {
  7, 2, 3, 4,
  6, 1, 5, 0
};

public static Byte Log2Plus1(Byte value) {
  if (value == 0)
    return 0;

  var roundedValue = value;
  roundedValue |= (Byte) (roundedValue >> 1);
  roundedValue |= (Byte) (roundedValue >> 2);
  roundedValue |= (Byte) (roundedValue >> 4);
  var log2 = multiplyDeBruijnBitPosition[((Byte) (roundedValue*0xE3)) >> 5];
  return (Byte) (log2 + 1);
}

This bit twiddling hack is taken from Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup where you can see the equivalent C source code for 32 bit values. This code has been adapted to work on 8 bit values.

However, you may be able to use an operation that gives you the result using a very efficient built-in function (on many CPU's a single instruction like the Bit Scan Reverse is used). An answer to the question Bit twiddling: which bit is set? has some information about this. A quote from the answer provides one possible reason why there is low level support for solving this problem:

Things like this are the core of many O(1) algorithms such as kernel schedulers which need to find the first non-empty queue signified by an array of bits.

edited May 23 '17 at 10:34

Community

1
1

answered Jan 25 '13 at 00:39

Martin Liversage

104,481
22
209
256

This was the sort of hashed lookup table approach I was looking for. However, I wanted to know if there was a `byte` to `byte` hash function & corresponding lookup table rather then an `int` to `int` hash function. Obviously it will work but takes up 4 times the space and requires casting. – recursion.ninja Jan 25 '13 at 01:59
@awashburn: I have updated the function to use the proper `B(2, 3)` De Bruijn sequence to stay within 8 bits. It took me a while to get my head around the algorithm. – Martin Liversage Jan 25 '13 at 02:26
+1, but note that the float cast solution is using the FPU to compute this log value, since the x86 (IEEE) floating point representation normalizes the mantissa. – jxh Jan 25 '13 at 02:29
@MartinLiversage When I converted the algorithm above to `Java`, the answer did not work, it failed for input `0x01` attempting to access array index `-1`. (index out of bounds) – recursion.ninja Jan 25 '13 at 15:35
@MartinLiversage I got it to work in `Java` it just requires an additional mask of `lookup[(byte)(((roundedValue*0xE3)>>5)&0x07)]`. Java *helps* you by preserving the sign bit. – recursion.ninja Jan 25 '13 at 17:10
@awashburn: In .NET the following rules apply: The `Byte` is implicitly converted to `Int32` before being multiplied with `0xE3` (another `Int32`) and because `Byte` is unsigned there is no sign bit to preserve when converting. – Martin Liversage Jan 25 '13 at 19:16

score 1 · Answer 2 · answered Jan 25 '13 at 02:24

That was a fun little challenge. I don't know if this one is completely portable since I only have VC++ to test with, and I certainly can't say for sure if it's more efficient than other approaches. This version was coded with a loop but it can be unrolled without too much effort.

static unsigned char check(unsigned char b)
{
  unsigned char r = 8;
  unsigned char sub = 1;
  unsigned char s = 7;
  for (char i = 0; i < 8; i++)
  {
      sub = sub & ((( b & (1 << s)) >> s--) - 1);
      r -= sub;
  }
  return r;
}

score 1 · Answer 3 · answered Jan 30 '13 at 20:06

I'm sure everyone else has long since moved on to other topics but there was something in the back of my mind suggesting that there had to be a more efficient branch-less solution to this than just unrolling the loop in my other posted solution. A quick trip to my copy of Warren put me on the right track: Binary search.

Here's my solution based on that idea:

  Pseudo-code:

  // see if there's a bit set in the upper half   
  if ((b >> 4) != 0)  
  {
      offset = 4;
      b >>= 4;   
  }   
  else
      offset = 0;

  // see if there's a bit set in the upper half of what's left   
  if ((b & 0x0C) != 0)   
  {
    offset += 2;
    b >>= 2;   
  }

  // see if there's a bit set in the upper half of what's left   
  if > ((b & 0x02) != 0)   
  {
    offset++;
    b >>= 1;   
  }

  return b + offset;

Branch-less C++ implementation:

static unsigned char check(unsigned char b)
{    
  unsigned char adj = 4 & ((((unsigned char) - (b >> 4) >> 7) ^ 1) - 1);
  unsigned char offset = adj;
  b >>= adj;
  adj = 2 & (((((unsigned char) - (b & 0x0C)) >> 7) ^ 1) - 1);
  offset += adj;
  b >>= adj;
  adj = 1 & (((((unsigned char) - (b & 0x02)) >> 7) ^ 1) - 1);
  return (b >> adj) + offset + adj;
}

Yes, I know that this is all academic :)

If this passes all the test cases I'll do some brief performance comparisons between your answer and Martin's and accept the most efficient code. If it is close, I'll take space into consideration, favoring yours for being completely arithmetic and not requiring a lookup table. — recursion.ninja, Feb 01 '13 at 21:21
After 2^30 function calls: Martin's: `0m30.341s` Yours: `0m51.987s`. On average Martin's answer +50% faster then yours. You get brownie points for no branching though ;) — recursion.ninja, Feb 01 '13 at 22:07
What does `(unsigned char) - (b & 0x0C)` do? How is `(unsigned char)` handled as a variable? — recursion.ninja, Feb 01 '13 at 22:09
The minus forces a type-change from unsigned char (byte) to int. The (unsigned char) cast converts the result of the negation back to byte-size. Of course, the negation is meant to set the sign bit if the result of the AND was non-zero, and clear it otherwise. This bit is then reversed and turned into a mask (all zeroes or all ones) by subtracting one. — 500 - Internal Server Error, Feb 01 '13 at 22:31

Paolo · Answer 4 · 2013-01-24T23:43:35.047

0

It is not possible in plain C. The best I would suggest is the following implementation of check. Despite quite "ugly" I think it runs faster than the ckeck version in the question.

int check(unsigned char b)
{
    if(b&128) return 8;
    if(b&64)  return 7;
    if(b&32)  return 6;
    if(b&16)  return 5;
    if(b&8)   return 4;
    if(b&4)   return 3;
    if(b&2)   return 2;
    if(b&1)   return 1;
              return 0;
}

edited Jan 24 '13 at 23:43

answered Jan 24 '13 at 23:20

Paolo

15,233
27
70
91

Did you see the OP's request to avoid branching? – Carl Norum Jan 24 '13 at 23:22
@Paolo Preferably means *as a last resort*, I'm really looking for a bit twiddling hack. – recursion.ninja Jan 24 '13 at 23:29

jxh · Answer 5 · 2013-01-25T00:39:51.640

0

Edit: I found a link to the actual code: http://www.hackersdelight.org/hdcodetxt/nlz.c.txt The algorithm below is named nlz8 in that file. You can choose your favorite hack.

/*
From last comment of: http://stackoverflow.com/a/671826/315052
> Hacker's Delight explains how to correct for the error in 32-bit floats
> in 5-3 Counting Leading 0's. Here's their code, which uses an anonymous
> union to overlap asFloat and asInt: k = k & ~(k >> 1); asFloat =
> (float)k + 0.5f; n = 158 - (asInt >> 23); (and yes, this relies on
> implementation-defined behavior) - Derrick Coetzee Jan 3 '12 at 8:35
*/

unsigned char check (unsigned char b) {
    union {
        float    asFloat;
        int      asInt;
    } u;
    unsigned k = b & ~(b >> 1);
    u.asFloat = (float)k + 0.5f;
    return 32 - (158 - (u.asInt >> 23));
}

Edit -- not exactly sure what the asker means by language independent, but below is the equivalent code in python.

import ctypes

class Anon(ctypes.Union):
    _fields_ = [
        ("asFloat", ctypes.c_float),
        ("asInt", ctypes.c_int)
    ]

def check(b):
    k = int(b) & ~(int(b) >> 1)
    a = Anon(asFloat=(float(k) + float(0.5)))
    return 32 - (158 - (a.asInt >> 23))

edited Jan 25 '13 at 00:39

answered Jan 24 '13 at 23:41

jxh

69,070
8
110
193

Your answer works, however I don't want to accept an answer using `union` as it is not language independent. `check()` should also return an `unsigned char` not an `int`. Perhaps I can look how the algorithm above manipulates the bits to find a specific `byte` to `byte` solution. – recursion.ninja Jan 24 '13 at 23:49
1

@awashburn: Do you want to remove the C tag from the question then? Or, perhaps I don't know what you mean by language independent. – jxh Jan 24 '13 at 23:53
It means using core primitive operators that are shared amongst all languages. `-`,`+`,`*`,`/`,`&`,`|`,`^`,`~`,etc – recursion.ninja Jan 25 '13 at 00:51
Is casting considered a *core primitive operator* ? – wildplasser Jan 25 '13 at 01:05
I'd *prefer* to avoid `casting` but every higher level language I can think of supports casting... I would consider it a core primitive operator... that I want to avoid. – recursion.ninja Jan 25 '13 at 01:55
@awashburn: That's a fine requirement, but then the solution does not have anything to do with C. That said, I already showed the solution in Python. You even have someone posting a solution in Java now. Probably better to remove the C tag. – jxh Jan 25 '13 at 02:24
I have removed the `C` tag – recursion.ninja Jan 25 '13 at 02:43

Determine Position of Most Signifiacntly Set Bit in a Byte

5 Answers5