C: Finding the bin index which a value belongs to

Question

I have bins where the ranges grow exponentially.

Bin 0 -> [0 <= x <= 10] (interval = 10)
Bin 1 -> [11 <= x <= 30] (interval = 20)
Bin 2 -> [31 <= x <= 70] (interval = 40)
Bin 3 -> [71 <= x <= 150] (interval = 80)
Bin 4 -> [151 <= x <= 310] (interval = 160)

... and so on.

The number of bins and first interval are known prior (in this case it is 5, and 10 respectively). x lowest possible value is 0.

What I am currently doing is a standard for-loop that multiplies by 2 each time, and then return the index of the bin if value is within the range.

Is there a smarter way of doing this?

Are you dealing with integers? Are there 10 or 11 elements in bin 0? — MFisherKDX, Jun 19 '19 at 23:44
Apart from `0`, the `bin = floor(log2( 1 + ((x-1) / 10) ));` — Weather Vane, Jun 19 '19 at 23:53
You might get rounding errors using `log2()` and an integer computation is probably quicker. For example if the log is `1.999999999999` then you will get `1`. — Weather Vane, Jun 20 '19 at 00:02
Since the base is 2 and the input and output are integers, a function such as `__builtin_clz` could be used, unfortunately these are not portable. — harold, Jun 20 '19 at 00:07
Unless there are a large number of bins so that looping too many times is a performance problem, I'd just stick with the loop. — Barmar, Jun 20 '19 at 00:36

Lior Kogan · Answer 1 · 2019-06-20T19:47:50.337

As suggested by Weather Vane,

bin = floor( log2 (        (value + 9) / 10   ))

also:

bin = floor( log2 ( floor( (value + 9) / 10 ) ))

Integer division (i1/i2) in C is equal to trunc(i1/i2) (truncation toward zero), which is equivalent to floor(i1/i2) for non-negative integers, so there is no need to implement the inner floor.

floor(log2(i)) can be implemented quite efficiently. See the accepted answer here for a fast 32-bit and 64-bit integer implementations.

Here is the code (valid when int is 32-bit). OnlineGDB

#include <stdio.h>

const unsigned int tab32[32] = {
     0,  9,  1, 10, 13, 21,  2, 29,
    11, 14, 16, 18, 22, 25,  3, 30,
     8, 12, 20, 28, 15, 17, 24,  7,
    19, 27, 23,  6, 26,  5,  4, 31};

unsigned int log2_32(unsigned int value)
{
    value |= value >>  1;
    value |= value >>  2;
    value |= value >>  4;
    value |= value >>  8;
    value |= value >> 16;
    return tab32[(value * 0x07C4ACDD) >> 27];
}

int main()
{
    unsigned int value = 151;

    unsigned int bin = 0;
    if (value > 0)
        bin = log2_32( (value + 9) / 10 );

    printf("value: %u, bin: %u", value, bin);

    return 0;
}

The intervals are in GP . So the nth bucket boundary would be sum of the intervals =start_interval * (2^n+1) -1. An element to be the upper boundary it should be the multiple of start_interval so do x = x+x%start_interval ; then x =x%start_interval. the resulting x should be a power of 2 to be the upper boundary of an interval so x <= 2^n where n is the bucket no. ceil(logx) would give n. Isn't the above solution a mathematical representation of similar logic? — Christina Jacob, Jun 20 '19 at 05:11

Gene · Accepted Answer · 2019-06-20T07:38:27.753

The log arithmetic is cool, but you could also just build a table and use binary search to find the bin.

int find_bin(unsigned x) {
 static unsigned bin_top[] = {
    10u, // 0
    30u, // 1
    70u, // 2
    150u, // 3
    310u, // 4
    630u, // 5
    1270u, // 6
    2550u, // 7
    5110u, // 8
    10230u, // 9
    20470u, // 10
    40950u, // 11
    81910u, // 12
    163830u, // 13
    327670u, // 14
    655350u, // 15
    1310710u, // 16
    2621430u, // 17
    5242870u, // 18
    10485750u, // 19
    20971510u, // 20
    41943030u, // 21
    83886070u, // 22
    167772150u, // 23
    335544310u, // 24
    671088630u, // 25
    1342177270u, // 26
    2684354550u, // 27
    4294967295u, // 28
  };
  int lo = 0, hi = 28;
  while (lo < hi) {
    int mid = (lo + hi) / 2;
    if (mid > 0 && x <= bin_top[mid - 1]) hi = mid - 1;
    else if (x > bin_top[mid]) lo = mid + 1;
    else return mid;
  }
  return lo;
}

The loop will never run more than five iterations, so it's pretty quick. Bin 28 is the biggest unsigned value, which isn't the calculated one.

C: Finding the bin index which a value belongs to

2 Answers2