0

I have bins where the ranges grow exponentially.

Bin 0 -> [0 <= x <= 10] (interval = 10)
Bin 1 -> [11 <= x <= 30] (interval = 20)
Bin 2 -> [31 <= x <= 70] (interval = 40)
Bin 3 -> [71 <= x <= 150] (interval = 80)
Bin 4 -> [151 <= x <= 310] (interval = 160)

... and so on.

The number of bins and first interval are known prior (in this case it is 5, and 10 respectively). x lowest possible value is 0.

What I am currently doing is a standard for-loop that multiplies by 2 each time, and then return the index of the bin if value is within the range.

Is there a smarter way of doing this?

i_use_the_internet
  • 604
  • 1
  • 9
  • 28

2 Answers2

2

As suggested by Weather Vane,

bin = floor( log2 (        (value + 9) / 10   ))

also:

bin = floor( log2 ( floor( (value + 9) / 10 ) ))

Integer division (i1/i2) in C is equal to trunc(i1/i2) (truncation toward zero), which is equivalent to floor(i1/i2) for non-negative integers, so there is no need to implement the inner floor.

floor(log2(i)) can be implemented quite efficiently. See the accepted answer here for a fast 32-bit and 64-bit integer implementations.

Here is the code (valid when int is 32-bit). OnlineGDB

#include <stdio.h>

const unsigned int tab32[32] = {
     0,  9,  1, 10, 13, 21,  2, 29,
    11, 14, 16, 18, 22, 25,  3, 30,
     8, 12, 20, 28, 15, 17, 24,  7,
    19, 27, 23,  6, 26,  5,  4, 31};

unsigned int log2_32(unsigned int value)
{
    value |= value >>  1;
    value |= value >>  2;
    value |= value >>  4;
    value |= value >>  8;
    value |= value >> 16;
    return tab32[(value * 0x07C4ACDD) >> 27];
}

int main()
{
    unsigned int value = 151;

    unsigned int bin = 0;
    if (value > 0)
        bin = log2_32( (value + 9) / 10 );

    printf("value: %u, bin: %u", value, bin);

    return 0;
}
Lior Kogan
  • 19,919
  • 6
  • 53
  • 85
  • The intervals are in GP . So the nth bucket boundary would be sum of the intervals =start_interval * (2^n+1) -1. An element to be the upper boundary it should be the multiple of start_interval so do x = x+x%start_interval ; then x =x%start_interval. the resulting x should be a power of 2 to be the upper boundary of an interval so x <= 2^n where n is the bucket no. ceil(logx) would give n. Isn't the above solution a mathematical representation of similar logic? – Christina Jacob Jun 20 '19 at 05:11
  • @ChristinaJacob: yes, it is – Lior Kogan Jun 20 '19 at 12:10
1

The log arithmetic is cool, but you could also just build a table and use binary search to find the bin.

int find_bin(unsigned x) {
 static unsigned bin_top[] = {
    10u, // 0
    30u, // 1
    70u, // 2
    150u, // 3
    310u, // 4
    630u, // 5
    1270u, // 6
    2550u, // 7
    5110u, // 8
    10230u, // 9
    20470u, // 10
    40950u, // 11
    81910u, // 12
    163830u, // 13
    327670u, // 14
    655350u, // 15
    1310710u, // 16
    2621430u, // 17
    5242870u, // 18
    10485750u, // 19
    20971510u, // 20
    41943030u, // 21
    83886070u, // 22
    167772150u, // 23
    335544310u, // 24
    671088630u, // 25
    1342177270u, // 26
    2684354550u, // 27
    4294967295u, // 28
  };
  int lo = 0, hi = 28;
  while (lo < hi) {
    int mid = (lo + hi) / 2;
    if (mid > 0 && x <= bin_top[mid - 1]) hi = mid - 1;
    else if (x > bin_top[mid]) lo = mid + 1;
    else return mid;
  }
  return lo;
}

The loop will never run more than five iterations, so it's pretty quick. Bin 28 is the biggest unsigned value, which isn't the calculated one.

Gene
  • 46,253
  • 4
  • 58
  • 96