3

I have two unsigned ints X and Y, and I want to efficiently decide if X is at most half as long as Y, where the length of X is k+1, where 2^k is the largest power of 2 that is no larger than X.

i.e., X=0000 0101 has length 3, Y=0111 0000 is more than twice as long as X.

Obviously we can check by looking at individual bits in X and Y, for example by shifting right and counting in a loop, but is there an efficient, bit-twiddling (and loop-less) solution?

The (toy) motivation comes from the fact that I want to divide the range RAND_MAX either into range buckets or RAND_MAX/range buckets, plus some remainder, and I prefer use the larger number of buckets. If range is (approximately) at most the square root of RAND_MAX (i.e., at most half as long), than I prefer using RAND_MAX/range buckets, and otherwise I want to use range buckets.

It should be noted, therefore, that X and Y might be large, where possibly Y=1111 1111, in the 8-bit example above. We certainly don't want to square X.

Edit, post-answer: The answer below mentions the built-in count leading zeros function (__builtin_clz()), and that is probably the fastest way to compute the answer. If for some reason this is unavailable, the lengths of X and Y can be obtained through some well-known bit twiddling.

First, smear the bits of X to the right (filling X with 1s except its leading 0s), and then do a population count. Both of these operations involve O(log k) operations, where k is the number of bits that X occupies in memory (my examples are for uint32_t, 32 bit unsigned integers). There are various implementations, but I put the ones that are easiest to understand below:

//smear
x = x | x>>1;
x = x | x>>2;
x = x | x>>4;
x = x | x>>8;
x = x | x>>16;

//population count
x = ( x & 0x55555555 ) + ( (x >> 1 ) & 0x55555555 );
x = ( x & 0x33333333 ) + ( (x >> 2 ) & 0x33333333 );
x = ( x & 0x0F0F0F0F ) + ( (x >> 4 ) & 0x0F0F0F0F );
x = ( x & 0x00FF00FF ) + ( (x >> 8 ) & 0x00FF00FF );
x = ( x & 0x0000FFFF ) + ( (x >> 16) & 0x0000FFFF );

The idea behind the population count is to divide and conquer. For example with 01 11, I first count the 1-bits in 01: there is 1 1-bit on the right, and there are 0 1-bits on the left, so I record that as 01 (in place). Similarly, 11 becomes 10, so the updated bit-string is 01 10, and now I will add the numbers in buckets of size 2, and replace the pair of them with the result; 1+2=3, so the bit string becomes 0011, and we are done. The original bit-string is replaced with the population count.

There are faster ways to do the pop count given in Hacker's Delight, but this one is easier to explain, and seems to be the basis for most of the others. You can get my code as a Gist here..

X=0000 0000 0111 1111 1000 1010 0010 0100 
Set every bit that is 1 place to the right of a 1
0000 0000 0111 1111 1100 1111 0011 0110 
Set every bit that is 2 places to the right of a 1
0000 0000 0111 1111 1111 1111 1111 1111 
Set every bit that is 4 places to the right of a 1
0000 0000 0111 1111 1111 1111 1111 1111 
Set every bit that is 8 places to the right of a 1
0000 0000 0111 1111 1111 1111 1111 1111 
Set every bit that is 16 places to the right of a 1
0000 0000 0111 1111 1111 1111 1111 1111 
Accumulate pop counts of bit buckets size 2
0000 0000 0110 1010 1010 1010 1010 1010 
Accumulate pop counts of bit buckets size 4
0000 0000 0011 0100 0100 0100 0100 0100 
Accumulate pop counts of bit buckets size 8
0000 0000 0000 0111 0000 1000 0000 1000 
Accumulate pop counts of bit buckets size 16
0000 0000 0000 0111 0000 0000 0001 0000 
Accumulate pop counts of bit buckets size 32
0000 0000 0000 0000 0000 0000 0001 0111 

The length of 8358436 is 23 bits

Y=0000 0000 0000 0000 0011 0000 1010 1111 
Set every bit that is 1 place to the right of a 1
0000 0000 0000 0000 0011 1000 1111 1111 
Set every bit that is 2 places to the right of a 1
0000 0000 0000 0000 0011 1110 1111 1111 
Set every bit that is 4 places to the right of a 1
0000 0000 0000 0000 0011 1111 1111 1111 
Set every bit that is 8 places to the right of a 1
0000 0000 0000 0000 0011 1111 1111 1111 
Set every bit that is 16 places to the right of a 1
0000 0000 0000 0000 0011 1111 1111 1111 
Accumulate pop counts of bit buckets size 2
0000 0000 0000 0000 0010 1010 1010 1010 
Accumulate pop counts of bit buckets size 4
0000 0000 0000 0000 0010 0100 0100 0100 
Accumulate pop counts of bit buckets size 8
0000 0000 0000 0000 0000 0110 0000 1000 
Accumulate pop counts of bit buckets size 16
0000 0000 0000 0000 0000 0000 0000 1110 
Accumulate pop counts of bit buckets size 32
0000 0000 0000 0000 0000 0000 0000 1110 

The length of 12463 is 14 bits

So now I know that 12463 is significantly larger than the square root of 8358436, without taking square roots, or casting to floats, or dividing or multiplying.

See also Stackoverflow and Haacker's Delight (it's a book, of course, but I linked to some snippets on their website).

Community
  • 1
  • 1
Alejandro
  • 623
  • 6
  • 16

1 Answers1

3

If you are dealing with unsigned int and sizeof(unsigned long long) >= sizeof(unsigned int), you can just use the square method after casting:

(unsigned long long)X * (unsigned long long)X <= (unsigned long long)Y

If not, you can still use the square method if X is less than the square root of UINT_MAX+1, which you may need to hard code in the function.

Otherwise, you could use floating point calculation:

sqrt((double)Y) >= (double)X

On modern CPUs, this would be quite fast anyway.

If you are OK with gcc extensions, you can use __builtin_clz() to compute the length of X and Y:

int length_of_X = X ? sizeof(X) * CHAR_BIT - __builtin_clz(X) : 0;
int length_of_Y = Y ? sizeof(Y) * CHAR_BIT - __builtin_clz(Y) : 0;
return length_of_X * 2 <= length_of_Y;

__buitin_clz() compiles to a single instruction on modern Intel CPUs.

Here is a discussion on more portable ways to count leading zeroes you could use to implement your length function: Counting leading zeros in a 32 bit unsigned integer with best algorithm in C programming or this one: Implementation of __builtin_clz

Community
  • 1
  • 1
chqrlie
  • 131,814
  • 10
  • 121
  • 189
  • Type-O: `sqrt((double)Y) >= (double)X`. This works, of course, but it's not exactly the answer I'm looking for, so I'll hold out for a bit :) (comment on unedited answer) – Alejandro Jul 31 '16 at 23:04
  • Some discussion of __builtin_clz() here http://stackoverflow.com/questions/9353973/implementation-of-builtin-clz . Thanks, I've learnt about a new thing! (gcc extensions ^^) – Alejandro Jul 31 '16 at 23:12
  • I would personally go with the multiplication option, and it would be much faster than the sqrt option – Nick Mertin Aug 01 '16 at 00:07
  • The square method doesn't work does it? E.g. X = 5, Y = 30. 101 is more than half as long as 11110 but the test will pass. – samgak Aug 01 '16 at 00:26
  • @samgak Question says 'at most half as long' so that should not pass. – Markus Laire Aug 01 '16 at 01:26
  • @MarkusLaire yes, it should not pass. But it does pass, because 5 * 5 <= 30 – samgak Aug 01 '16 at 01:29
  • 1
    @samgak: if the test is strictly based on the position of the most significant bit, the last solution is the way to go. If the bit twiddling is just used as a quick approximation for the amplitude test, the multiplication (if available) of the square root can be used. The OP should do some benchmarking and make an educated choice. – chqrlie Aug 01 '16 at 11:05