5

I need to calculate the entropy and due to the limitations of my system I need to use restricted C features (no loops, no floating point support) and I need as much precision as possible. From here I figure out how to estimate the floor log2 of an integer using bitwise operations. Nevertheless, I need to increase the precision of the results. Since no floating point operations are allowed, is there any way to calculate log2(x/y) with x < y so that the result would be something like log2(x/y)*10000, aiming at getting the precision I need through arithmetic integer?

ascub
  • 95
  • 1
  • 11
  • Maybe try `frexp` or [related functions](https://en.cppreference.com/w/cpp/numeric/math/frexp)? – Kerrek SB Dec 16 '18 at 02:19
  • @KerrekSB I think that won't work because I have no way to store the fractional part for later processing (all I can use are `int` type variables). The whole idea is to transform `log2(x/y)` into `log2(x)-log2(y)` and then somehow insert the arithmetic integer into de logarith computation to obtain the otherwise fractional part as an arbitrary large integer. – ascub Dec 16 '18 at 02:27
  • Oh, sorry, I realize now that you want to compute the log2 of an integer. I was confused by your question title which says "log2 of float". – Kerrek SB Dec 16 '18 at 02:35
  • @KerrekSB my bad, I've edited the title accordingly – ascub Dec 16 '18 at 02:37
  • 2
    https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious [Integer Log2 implemented using binary search](https://codereview.stackexchange.com/q/151312/34190), [What's the quickest way to compute log2 of an integer in C#?](https://stackoverflow.com/q/8970101/995714), [Fast computing of log2 for 64-bit integers](https://stackoverflow.com/q/11376288/995714) – phuclv Dec 16 '18 at 02:47
  • @phuclv, the lookup table version is going to faster https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogLookup (for that matter, so is https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn) – nemequ Dec 16 '18 at 03:26
  • 1
    @nemequ I know. I just give the first solution so when people scroll down they can see all the possible ways – phuclv Dec 16 '18 at 03:52
  • 1
    Searching the web for “fixed-point base-two logarithm” shows a number of relevant results, including [this Stack Overflow question](https://stackoverflow.com/questions/4657468/fast-fixed-point-pow-log-exp-and-sqrt). – Eric Postpischil Dec 16 '18 at 12:22
  • @chux: Your question about expressing the result is answered in the question, which gives `log2(x/y)*10000` as a sample form of acceptable result and asks about “something like” that, which indicates alternative formats are acceptable. – Eric Postpischil Dec 16 '18 at 12:24
  • This previous [answer](https://stackoverflow.com/questions/32159300/compute-logarithmic-expression-without-floating-point-arithmetics-or-log/32163873#32163873) may be helpful, showing how to extract the binary logarithm one bit at a time. – njuffa Dec 17 '18 at 17:58

1 Answers1

4

You will base an algorithm on the formula

log2(x/y) = K*(-log(x/y));

where

 K        = -1.0/log(2.0); // you can precompute this constant before run-time
 a        = (y-x)/y;
-log(x/y) = a + a^2/2 + a^3/3 + a^4/4 + a^5/5 + ...

If you write the loop correctly—or, if you prefer, unroll the loop to code the same sequence of operations looplessly—then you can handle everything in integer operations:

(y^N*(1*2*3*4*5*...*N)) * (-log(x/y))
  = y^(N-1)*(2*3*4*5*...*N)*(y-x) + y^(N-2)*(1*3*4*5*...*N)*(y-x)^2 + ...

Of course, ^, the power operator, binding tighter than *, is not a C operator, but you can implement that efficiently in the context of your (perhaps unrolled) loop as a running product.

The N is an integer large enough to afford desired precision but not so large that it overruns the number of bits you have available. If unsure, then try N = 6 for instance. Regarding K, you might object that that is a floating-point number, but this is not a problem for you because you are going to precompute K, storing it as a ratio of integers.

SAMPLE CODE

This is a toy code but it works for small values of x and y such as 5 and 7, thus sufficing to prove the concept. In the toy code, larger values can silently overflow the default 64-bit registers. More work would be needed to make the code robust.

#include <stddef.h>
#include <stdlib.h>
// Your program will not need the below headers, which are here
// included only for comparison and demonstration.
#include <math.h>
#include <stdio.h>

const size_t     N = 6;
const long long Ky = 1 << 10; // denominator of K
// Your code should define a precomputed value for Kx here.

int main(const int argc, const char *const *const argv)
{
    // Your program won't include the following library calls but this
    // does not matter.  You can instead precompute the value of Kx and
    // hard-code its value above with Ky.
    const long long Kx = lrintl((-1.0/log(2.0))*Ky); // numerator of K
    printf("K == %lld/%lld\n", Kx, Ky);

    if (argc != 3) exit(1);

    // Read x and y from the command line.
    const long long x0 = atoll(argv[1]);
    const long long y  = atoll(argv[2]);
    printf("x/y == %lld/%lld\n", x0, y);
    if (x0 <= 0 || y <= 0 || x0 > y) exit(1);

    // If 2*x <= y, then, to improve accuracy, double x repeatedly
    // until 2*x > y. Each doubling offsets the log2 by 1. The offset
    // is to be recovered later.
    long long               x = x0;
    int integral_part_of_log2 = 0;
    while (1) {
        const long long trial_x = x << 1;
        if (trial_x > y) break;
        x = trial_x;
        --integral_part_of_log2;
    }
    printf("integral_part_of_log2 == %d\n", integral_part_of_log2);

    // Calculate the denominator of -log(x/y).
    long long yy = 1;
    for (size_t j = N; j; --j) yy *= j*y;

    // Calculate the numerator of -log(x/y).
    long long xx = 0;
    {
        const long long y_minus_x = y - x;
        for (size_t i = N; i; --i) {
            long long term = 1;
            size_t j       = N;
            for (; j > i; --j) {
                term *= j*y;
            }
            term *= y_minus_x;
            --j;
            for (; j; --j) {
                term *= j*y_minus_x;
            }
            xx += term;
        }
    }

    // Convert log to log2.
    xx *= Kx;
    yy *= Ky;

    // Restore the aforementioned offset.
    for (; integral_part_of_log2; ++integral_part_of_log2) xx -= yy;

    printf("log2(%lld/%lld) == %lld/%lld\n", x0, y, xx, yy);
    printf("in floating point, this ratio of integers works out to %g\n",
      (1.0*xx)/(1.0*yy));
    printf("the CPU's floating-point unit computes the log2 to be  %g\n",
      log2((1.0*x0)/(1.0*y)));

    return 0;
}

Running this on my machine with command-line arguments of 5 7, it outputs:

K == -1477/1024
x/y == 5/7
integral_part_of_log2 == 0
log2(5/7) == -42093223872/86740254720
in floating point, this ratio of integers works out to -0.485279
the CPU's floating-point unit computes the log2 to be  -0.485427

Accuracy would be substantially improved by N = 12 and Ky = 1 << 20, but for that you need either thriftier code or more than 64 bits.

THRIFTIER CODE

Thriftier code, wanting more effort to write, might represent numerator and denominator in prime factors. For example, it might represent 500 as [2 0 3], meaning (22)(30)(53).

Yet further improvements might occur to your imagination.

AN ALTERNATE APPROACH

For an alternate approach, though it might not meet your requirements precisely as you have stated them, @phuclv has given the suggestion I would be inclined to follow if your program were mine: work the problem in reverse, guessing a value c/d for the logarithm and then computing 2^(c/d), presumably via a Newton-Raphson iteration. Personally, I like the Newton-Raphson approach better. See sect. 4.8 here (my original).

MATHEMATICAL BACKGROUND

Several sources including mine already linked explain the Taylor series underlying the first approach and the Newton-Raphson iteration of the second approach. The mathematics unfortunately is nontrivial, but there you have it. Good luck.

thb
  • 13,796
  • 3
  • 40
  • 68
  • I've been trying to follow your recommendations but no success so far, I think I cannot fully understand your reasoning (e.g. to find `K`, `a`, and the `log(x/y)` series). Even without understanding it I tried to code it but the results are wrong. Could you please explain further and if possible provide a coding example? – ascub Dec 16 '18 at 14:43
  • Sure, why not? You have asked at a convenient moment, as it happens. I haven't written this kind of code in years, so the exercise serves to refresh old skills. Code posted. There you go. – thb Dec 16 '18 at 19:06
  • @phuclv : for information, my answer has adopted your comment. Thanks. – thb Dec 16 '18 at 19:14
  • this does exactly what I need. Thanks a lot! Once I figure out how to modify the code in order to satisfy larger number estimations I will post the update version. Thanks again! – ascub Dec 16 '18 at 19:47