5

I need to compute the mathematical expression floor(ln(u)/ln(1-p)) for 0 < u < 1 and 0 < p < 1 in C on an embedded processor with no floating point arithmetics and no ln function. The result is a positive integer. I know about the limit cases (p=0), I'll deal with them later...

I imagine that the solution involves having u and p range over 0..UINT16_MAX, and appeal to a lookup table for the logarithm, but I cannot figure out how exactly: what does the lookup table map to?

The result needs not be 100% exact, approximations are OK.

Thanks!

mqtthiqs
  • 445
  • 3
  • 9
  • 1
    How are you representing `u` and `p`? – Oliver Charlesworth Aug 22 '15 at 18:16
  • Well, I'm not at the moment, since I didn't solve this. Ideally, they would be `float`s, but they're not available, so `uint16_t` is a candidate... – mqtthiqs Aug 22 '15 at 18:17
  • 2
    This question is a bit broad, then (especially as "not 100% exact" could mean anything...). You could use a `uint16_t` to represent (e.g.) `u * 65536`. You could then (in theory) use a lookup table that returns `ln(u)`. You could then implement an integer division. – Oliver Charlesworth Aug 22 '15 at 18:18
  • Ok, that would be great, but ln(u) ranges from -inf to 0, with a lot of information close to 0, so if I round off to the closest integer I'll lose valuable info! See what I mean? `round(ln(u)) = 0` for any `u > 0.7`, so the division would give 0 for all these `u`... – mqtthiqs Aug 22 '15 at 18:23
  • Sorry if the question is too broad... I'm a neophyte in these questions and it's a bit difficult to narrow down what my problem is exactly... – mqtthiqs Aug 22 '15 at 18:25
  • 1
    Sure. You have a trade-off (essentially between dynamic range and resolution). You could also represent the output of the lookup table as a scaled value. – Oliver Charlesworth Aug 22 '15 at 18:25
  • So you don't have a floating-point unit, but your `u`and `p` are still floats, right? Otherwise `ln(1-p)` doesn't make a lot of sense... – EOF Aug 22 '15 at 18:25
  • EOF, `u` and `p` are my "mathematical" values (in **R**), I don't know yet how to represent them in the C code (but not as `float`s since I don't have them available) – mqtthiqs Aug 22 '15 at 18:27
  • Do you have limits on the domain for `u` and `p`, for example `u < 1.0`? – EOF Aug 22 '15 at 18:29
  • `0 < u < 1` and `0 < p < 1` (I'm making the bounds strict to avoid undefinedness, but it might be a bit larger than this) – mqtthiqs Aug 22 '15 at 18:31
  • 2
    Maybe your processor has a fast `count leading zeros` instruction, which can be used to approximate `log base2`. – EOF Aug 22 '15 at 18:36
  • 2
    Apart from using a lookup table, perhaps you can implement fixed-point arithmetic using the series `ln(1+x) = x - x^2/2 + x^3/3 - x^4/4 ...` where `-1 < x <= 1` – Weather Vane Aug 22 '15 at 18:37
  • Can you explain more about the application? Where does p and u come from? What is the result used for? This may influence the what approach is best. – Sigve Kolbeinson Aug 22 '15 at 21:45
  • Sure. This formula transforms a random number with a uniform probability `u` into one with a geometric probability. `p` is the probability parameter (how far off will likely be the next successful event). I am using this to generate a series of geometrically distributed random number on an STM32F1 processor, which does not support floats. My end application is an electronic music instrument. – mqtthiqs Aug 22 '15 at 22:36
  • @mqtthiqs Does this CPU have fast integer multiplies (if so which flavors: 16x16->32, 32x32->32,...)? Does it provide a "count leading zeros" instruction? Are you limited to 16-bit operations or can you also use 32-bit operations? Any limitations on table size due to memory size constraints? – njuffa Aug 23 '15 at 03:32

2 Answers2

16

Since the logarithm is used in both dividend and divisor, there is no need to use log(); we can use log2() instead. Due to the restrictions on the inputs u and p the logarithms are known to be both negative, so we can restrict ourselves to compute the positive quantity -log2().

We can use fixed-point arithmetic to compute the logarithm. We do so by multiplying the original input by a sequence of factors of decreasing magnitude that approach 1. Considering each of the factor in sequence, we multiply the input only by those factors that result in a product closer to 1, but without exceeding it. While doing so, we sum the log2() of the factors that "fit". At the end of this procedure we wind up with a number very close to 1 as our final product, and a sum that represents the binary logarithm.

This process is known in the literature as multiplicative normalization or pseudo division, and some early publications describing it are the works by De Lugish and Meggitt. The latter indicates that the origin is basically Henry Briggs's method for computing common logarithms.

B. de Lugish. "A Class of Algorithms for Automatic Evaluation of Functions and Computations in a Digital Computer". PhD thesis, Dept. of Computer Science, University of Illinois, Urbana, 1970.

J. E. Meggitt. "Pseudo division and pseudo multiplication processes". IBM Journal of Research and Development, Vol. 6, No. 2, April 1962, pp. 210-226

As the chosen set of factors comprises 2i and (1+2-i) the necessary multiplications can be performed without the need for a multiplication instruction: the products can be computed by either shift or shift plus add.

Since the inputs u and p are purely fractional numbers with 16 bits, we may want to chose a 5.16 fixed-point result for the logarithm. By simply dividing the two logarithm values, we remove the fixed-point scale factor, and apply a floor() operation at the same time, because for positive numbers, floor(x) is identical to trunc(x) and integer division is truncating.

Note that the fixed-point computation of the logarithm results in large relative error for inputs near 1. This in turn means the entire function computed using fixed-point arithmetic may deliver results significantly different from the reference if p is small. An example of this is the following test case: u=55af p=0052 res=848 ref=874.

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

/* input x is a 0.16 fixed-point number in [0,1)
   function returns -log2(x) as a 5.16 fixed-point number in (0, 16]
*/   
uint32_t nlog2_16 (uint16_t x)
{
    uint32_t r = 0;
    uint32_t t, a = x;

    /* try factors 2**i with i = 8, 4, 2, 1 */
    if ((t = a << 8       ) < 0x10000) { a = t; r += 0x80000; }
    if ((t = a << 4       ) < 0x10000) { a = t; r += 0x40000; }
    if ((t = a << 2       ) < 0x10000) { a = t; r += 0x20000; }
    if ((t = a << 1       ) < 0x10000) { a = t; r += 0x10000; }
    /* try factors (1+2**(-i)) with i = 1, .., 16 */
    if ((t = a + (a >>  1)) < 0x10000) { a = t; r += 0x095c0; }
    if ((t = a + (a >>  2)) < 0x10000) { a = t; r += 0x0526a; }
    if ((t = a + (a >>  3)) < 0x10000) { a = t; r += 0x02b80; }
    if ((t = a + (a >>  4)) < 0x10000) { a = t; r += 0x01664; }
    if ((t = a + (a >>  5)) < 0x10000) { a = t; r += 0x00b5d; }
    if ((t = a + (a >>  6)) < 0x10000) { a = t; r += 0x005ba; }
    if ((t = a + (a >>  7)) < 0x10000) { a = t; r += 0x002e0; }
    if ((t = a + (a >>  8)) < 0x10000) { a = t; r += 0x00171; }
    if ((t = a + (a >>  9)) < 0x10000) { a = t; r += 0x000b8; }
    if ((t = a + (a >> 10)) < 0x10000) { a = t; r += 0x0005c; }
    if ((t = a + (a >> 11)) < 0x10000) { a = t; r += 0x0002e; }
    if ((t = a + (a >> 12)) < 0x10000) { a = t; r += 0x00017; }
    if ((t = a + (a >> 13)) < 0x10000) { a = t; r += 0x0000c; }
    if ((t = a + (a >> 14)) < 0x10000) { a = t; r += 0x00006; }
    if ((t = a + (a >> 15)) < 0x10000) { a = t; r += 0x00003; }
    if ((t = a + (a >> 16)) < 0x10000) { a = t; r += 0x00001; }
    return r;
}

/* Compute floor(log(u)/log(1-p)) for 0 < u < 1 and 0 < p < 1,
   where 'u' and 'p' are represented as 0.16 fixed-point numbers
   Result is an integer in range [0, 1048676]
*/
uint32_t func (uint16_t u, uint16_t p)
{
    uint16_t one_minus_p = 0x10000 - p; // 1.0 - p
    uint32_t log_u = nlog2_16 (u);
    uint32_t log_p = nlog2_16 (one_minus_p);
    uint32_t res = log_u / log_p;  // divide and floor in one go
    return res;
}
njuffa
  • 23,970
  • 4
  • 78
  • 130
  • What an awesome solution! Works like a charm, carefully explained, I learned a ton of things on the way... Thank you! – mqtthiqs Aug 23 '15 at 08:53
  • 3
    @mqtthiqs Creating the answer took four hours of concentrated work, as I last worked with fixed-point arithmetic more than 10 years ago (~200 MHz ARM processors without FPU at the time). It was fun though to refresh my working knowledge of fixed-point computation, so thanks for asking the question :-) – njuffa Aug 23 '15 at 09:52
  • Great! Thank you again! – mqtthiqs Aug 23 '15 at 14:18
4

The maximum value of this function basically depends on the precision limit; that is, how arbitrarily close to the limits (u -> 0) or (1 - p -> 1) the fixed point values can be.

If we assume (k) fractional bits, e.g., with the limits: u = (2^-k) and 1 - p = 1 - (2^-k),
then the maximum value is: k / (k - log2(2^k - 1))

(As the ratio of natural logarithms, we are free to use any base e.g., lb(x) or log2)


Unlike njuffa's answer, I went with a lookup table approach, settling on k = 10 fractional bits to represent 0 < frac(u) < 1024 and 0 < frac(p) < 1024. This requires a log table with 2^k entries. Using 32-bit table values, we're only looking at a 4KiB table.

Any more than that, and you are using enough memory that you could seriously consider using the relevant parts of a 'soft-float' library. e.g., k = 16 would yield a 256KiB LUT.

We're computing the values - log2(i / 1024.0) for 0 < i < 1024. Since these values are in the open interval (0, k), we only need 4 binary digits to store the integral part. So we store the precomputed LUT in 32-bit [4.28] fixed-point format:

uint32_t lut[1024]; /* never use lut[0] */

for (uint32_t i = 1; i < 1024; i++)
    lut[i] = (uint32_t) (- (log2(i / 1024.0) * (268435456.0));

Given: u, p represented by [0.10] fixed-point values in [1, 1023] :

uint32_t func (uint16_t u, uint16_t p)
{
    /* assert: 0 < u, p < 1024 */

    return lut[u] / lut[1024 - p];
}

We can easily test all valid (u, p) pairs against the 'naive' floating-point evaluation:

floor(log(u / 1024.0) / log(1.0 - p / 1024.0))

and only get a mismatch (+1 too high) on the following cases:

u =  193, p =    1 :  1708 vs  1707 (1.7079978488147417e+03)
u =  250, p =  384 :     3 vs     2 (2.9999999999999996e+00)
u =  413, p =    4 :   232 vs   231 (2.3199989016957960e+02)
u =  603, p =    1 :   542 vs   541 (5.4199909906444600e+02)
u =  680, p =    1 :   419 vs   418 (4.1899938077226307e+02)

Finally, it turns out that using the natural logarithm in a [3.29] fixed-point format gives us even higher precision, where:

lut[i] = (uint32_t) (- (log(i / 1024.0) * (536870912.0));

only yields a single 'mismatch', though 'bignum' precision suggests it's correct:

u =  250, p =  384 :     3 vs     2 (2.9999999999999996e+00)
Community
  • 1
  • 1
Brett Hale
  • 21,653
  • 2
  • 61
  • 90
  • Thank you Brett! (and sorry for the delay) Nice solution too. Tried it, works like a charm; I chose njuffa's because space is tight on the microcontroller and I don't need that much precision but I'll definitely keep it in mind. Cheers! – mqtthiqs Aug 25 '15 at 08:59