1

There are various macro-based solutions out there to compute the integer-valued log2 at compile time, but what if you need a bit more precision than what you get with integers, i.e. a few binary places after the binary point? It doesn't matter whether the value generated is a floating point expression, or a fixed-point scaled integer expression, as long as it evaluates to a compile-time constant value.

What is this useful for, you may ask? The application I had in mind was to compute the number of bits needed to optimally pack a structure whose fields have value sets that don't span a power of 2 range.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
  • As an extension, gcc will let you use the floating-point `log2()` function in compile-time constant expressions. [Example](https://godbolt.org/z/15dKT4). clang doesn't, though, unless it needs some option that I couldn't find. – Nate Eldredge Dec 02 '20 at 22:23

1 Answers1

1

I've come up with the following - it's not particularly inventive, but it works, and doesn't slow the compilation totally to a crawl. IOW, it does what I needed it to do - add a couple (here: dozen) binary places' worth of precision after the binary point.

It works by representing the number as a power of 2 and trial-multiplying it by coefficients that happen to have log2 values equal to a binary fraction (2^-n). Multiplying by such coefficients is equivalent to adding together the logarithms, and thus the FRAC_LOG2 macro expands to a sum with elements selected with nested ternary expressions.

#define IROOT2_1 .7071067812 // 2^-(2^-1)
#define IROOT2_2 .8408964153 // 2^-(2^-2)
#define IROOT2_3 .9170040432 // 2^-(2^-3)
#define IROOT2_4 .9576032807 // 2^-(2^-4)
#define IROOT2_5 .9785720621 // 2^-(2^-5)
#define IROOT2_6 .9892280132 // 2^-(2^-6)
#define IROOT2_7 .9945994235 // 2^-(2^-7)
#define IROOT2_8 .9972960561 // 2^-(2^-8)
#define IROOT2_9 .9986471129 // 2^-(2^-9)
#define IROOT2_A .9993233275 // 2^-(2^-10)
#define IROOT2_B .9996616065 // 2^-(2^-11)
#define IROOT2_C .9998307889 // 2^-(2^-12)

#define BIT_SCAN_REV(n) \
(n>>15?15:n>>14?14:n>>13?13:n>>12?12:n>>11?11:n>>10?10:n>>9?9:\
n>>8?8:n>>7?7:n>>6?6:n>>5?5:n>>4?4:n>>3?3:n>>2?2:n>>1?1:0)

#define FRAC_LOG2_1(m,n) (1./4096.)*\
                         ((m<=n*IROOT2_1?2048:0)+FRAC_LOG2_2(m,n*(m<=n*IROOT2_1?IROOT2_1:1)))
#define FRAC_LOG2_2(m,n) ((m<=n*IROOT2_2?1024:0)+FRAC_LOG2_3(m,n*(m<=n*IROOT2_2?IROOT2_2:1)))
#define FRAC_LOG2_3(m,n)  ((m<=n*IROOT2_3?512:0)+FRAC_LOG2_4(m,n*(m<=n*IROOT2_3?IROOT2_3:1)))
#define FRAC_LOG2_4(m,n)  ((m<=n*IROOT2_4?256:0)+FRAC_LOG2_5(m,n*(m<=n*IROOT2_4?IROOT2_4:1)))
#define FRAC_LOG2_5(m,n)  ((m<=n*IROOT2_5?128:0)+FRAC_LOG2_6(m,n*(m<=n*IROOT2_5?IROOT2_5:1)))
#define FRAC_LOG2_6(m,n)   ((m<=n*IROOT2_6?64:0)+FRAC_LOG2_7(m,n*(m<=n*IROOT2_6?IROOT2_6:1)))
#define FRAC_LOG2_7(m,n)   ((m<=n*IROOT2_7?32:0)+FRAC_LOG2_8(m,n*(m<=n*IROOT2_7?IROOT2_7:1)))
#define FRAC_LOG2_8(m,n)   ((m<=n*IROOT2_8?16:0)+FRAC_LOG2_9(m,n*(m<=n*IROOT2_8?IROOT2_8:1)))
#define FRAC_LOG2_9(m,n)    ((m<=n*IROOT2_9?8:0)+FRAC_LOG2_A(m,n*(m<=n*IROOT2_9?IROOT2_9:1)))
#define FRAC_LOG2_A(m,n)    ((m<=n*IROOT2_A?4:0)+FRAC_LOG2_B(m,n*(m<=n*IROOT2_A?IROOT2_A:1)))
#define FRAC_LOG2_B(m,n)    ((m<=n*IROOT2_B?2:0)+FRAC_LOG2_C(m,n*(m<=n*IROOT2_B?IROOT2_B:1)))
#define FRAC_LOG2_C(m,n)     (m<=n*IROOT2_C?1:0)

#define FRAC_LOG2(n) (BIT_SCAN_REV(n) + FRAC_LOG2_1(1<<BIT_SCAN_REV(n), n))

It's not exactly cheap, of course - for a 2-digit number, it expands to about 700kb of code that the compiler has to dig through, but it has precision of over 5 fractional decimal digits.

A work around is to store the integral result of BIT_SCAN_REV in an enum, so that it's just a couple letters instead of about 170:

enum { 
  input = 36,
  bsr = BIT_SCAN_REV(input),
  bsr_ = 1<<bsr_,
};

static const float output = bsr + FRAC_LOG2_1(bsr_, input);

Another way of doing this at a much lower memory cost, without recursive macros, would require an include file to be used any time a value is to be computed.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
  • Nice, however I would expect bit masking instead of the nested `<` conditions ... but I am not very familiar with compile time expresions either ... :) the best I ever did in the field was inverse square matrix of any dimensionality as a template in C++. Here mine approach for the [log](https://stackoverflow.com/a/42108287/2521214) at runtime for comparison It looks like you doing the same thing ... – Spektre Dec 02 '20 at 08:48
  • Since this is done at compile time, and doesn’t particularly slow the compiler down, the form that’s simplest to understand wins and happens to have simplest expression. If you think about it, bit masking also does tests and has nested expressions and will have many more syntax tree elements than a string of recursive comparisons. 31 ternary op nodes, 32 binary op nodes, 64 numeric literal nodes. Hard to beat that. Remember that the compiler has to build a tree for the whole thing. The tail end doing a bit at a time is already pretty tough - there are c*2^12 tree nodes! – Kuba hasn't forgotten Monica Dec 02 '20 at 13:39