Log of a very large number

Question

I'm dealing with the BigInteger class with numbers in the order of 2 raised to the power 10,000,000.

The BigInteger Log function is now the most expensive function in my algorithm and I am desperately looking for an alternative.

Since I only need the integral part of the log, I came across this answer which seems brilliant in terms of speed but for some reason I am not getting accurate values. I do not care about the decimal part but I do need to get an accurate integral part whether the value is floored or ceiled as long as I know which.

Here is the function I implemented:

public static double LogBase2 (System.Numerics.BigInteger number)
{
    return (LogBase2(number.ToByteArray()));
}

public static double LogBase2 (byte [] bytes)
{
    // Corrected based on [ronalchn's] answer.
    return (System.Math.Log(bytes [bytes.Length - 1], 2) + ((bytes.Length - 1) * 8));
}

The values are now incredibly accurate except for corner cases. The values 7 to 7.99999, 15 to 15.9999, 23 to 23.9999 31 to 31.9999, etc. return -Infinity. The numbers seem to revolve around byte boundaries. Any idea what's going on here?

Example:

LogBase2(                    1081210289) = 30.009999999993600 != 30.000000000000000
LogBase2(                    1088730701) = 30.019999999613300 != 30.000000000000000
LogBase2(                    2132649894) = 30.989999999389400 != 30.988684686772200
LogBase2(                    2147483648) = 31.000000000000000 != -Infinity
LogBase2(                    2162420578) = 31.009999999993600 != -Infinity
LogBase2(                    4235837212) = 31.979999999984800 != -Infinity
LogBase2(                    4265299789) = 31.989999999727700 != -Infinity
LogBase2(                    4294967296) = 32.000000000000000 != 32.000000000000000
LogBase2(                    4324841156) = 32.009999999993600 != 32.000000000000000
LogBase2(                  545958373094) = 38.989999999997200 != 38.988684686772200
LogBase2(                  549755813887) = 38.999999999997400 != 38.988684686772200
LogBase2(                  553579667970) = 39.009999999998800 != -Infinity
LogBase2(                  557430119061) = 39.019999999998900 != -Infinity
LogBase2(                  561307352157) = 39.029999999998300 != -Infinity
LogBase2(                  565211553542) = 39.039999999997900 != -Infinity
LogBase2(                  569142910795) = 39.049999999997200 != -Infinity
LogBase2(                 1084374326282) = 39.979999999998100 != -Infinity
LogBase2(                 1091916746189) = 39.989999999998500 != -Infinity
LogBase2(                 1099511627775) = 39.999999999998700 != -Infinity

Is it just a massive coincidence that none of these overestimate the log? I would've thought all it would take was LSB > MSB. — Rawling, Aug 17 '12 at 10:20
Oh wait, there are a few. Still, I'd have expected it to be closer to 50/50. — Rawling, Aug 17 '12 at 10:31
Hmmm... it seems as if BigInteger seems to keep an extra most significant byte when the integral number reaches multiples of 8? The -Infinity is the result of Log(0). — Raheel Khan, Aug 17 '12 at 12:25
The extra byte ensures that the most significant bit is 0 for a positive integer. — ronalchn, Aug 17 '12 at 14:21

ronalchn · Accepted Answer · 2012-08-18T00:05:15.347

10

Try this:

public static int LogBase2(byte[] bytes)
{
    if (bytes[bytes.Length - 1] >= 128) return -1; // -ve bigint (invalid - cannot take log of -ve number)
    int log = 0;
    while ((bytes[bytes.Length - 1]>>log)>0) log++;
    return log + bytes.Length*8-9;
}

The reason for the most significant byte being 0 is because the BigInteger is a signed integer. When the most significant bit of the high-order byte is 1, an extra byte is tacked on to represent the sign bit of 0 for positive integers.

Also changed from using the System.Math.Log function because if you only want the rounded value, it is much faster to use bit operations.

If you have Microsoft Solver Foundation (download at http://msdn.microsoft.com/en-us/devlabs/hh145003.aspx), then you can use the BitCount() function:

public static double LogBase2(Microsoft.SolverFoundation.Common.BigInteger number)
{
    return number.BitCount;
}

Or you can use the java library. Add a reference to the vjslib library (found in the .NET tab - this is the J# implementation of the java library).

You can now add "using java.math" in your code.

java.math.BigInteger has a bitLength() function

edited Aug 18 '12 at 00:05

answered Aug 17 '12 at 10:09

ronalchn

12,225
10
51
61

2

@ronalchn: Yup, `ToByteArray` is little-endian (http://msdn.microsoft.com/en-us/library/system.numerics.biginteger.tobytearray.aspx) – Rawling Aug 17 '12 at 10:14
Comment deleted - apologies, my mistake. – David M Aug 17 '12 at 10:16
@ronalchn: Thanks, that worked but there seem to be some corner cases. I've edited the question. Please have a look. – Raheel Khan Aug 17 '12 at 12:20
For those it may help, this answer fixes the accuracy bug and takes care of negative integers. L.B's answer below adds that extra bit of optimization by using a look-up to avoid calculating Log2 for even the MSB. Thanks again. – Raheel Khan Aug 17 '12 at 19:49
netcore 3 has BitOperations.Log2 (https://learn.microsoft.com/en-us/dotnet/api/system.numerics.bitoperations.log2?view=netcore-3.0) that could replace the while loop. – Jeremy Lakeman Jun 17 '20 at 04:34

L.B · Answer 2 · 2012-08-17T15:47:26.267

4

BigInteger bi = new BigInteger(128);   
int log =  bi.Log2();

public static class BigIntegerExtensions
{
    static int[] PreCalc = new int[] { 8, 7, 6, 6, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};
    public static int Log2(this BigInteger bi)
    {
        byte[] buf = bi.ToByteArray();
        int len = buf.Length;
        return len * 8 - PreCalc[buf[len - 1]] - 1;
    }
}

edited Aug 17 '12 at 15:47

answered Aug 17 '12 at 11:04

L.B

114,136
19
178
224

Thank you. ronalchn's answer does fix the bug I posted so I will accept his answer out of fairness. Although accuracy is non-negotiable, I am willing to go lengths for speed as well. Could you please help me understand how this look-up is working in concept? I wouldn't want to use code without fully understanding it. Thanks again. – Raheel Khan Aug 17 '12 at 19:39
1

@RaheelKhan: All these answers do a log2 on only the top byte, of which there are 128 possible inputs. L.B has precalculated the log2 of each of these inputs, so the processor doesn't have to calculate them at runtime. This should be just as accurate as ronalchn's answer, but on modern computers this is faster. – Mooing Duck Aug 17 '12 at 19:43
@RaheelKhan In short, your `log2` is `bytes.Length*8`, but if MSB has leading zeros you should subtract them from the result. For ex, 64(0x40) has two leading zeros to subtract(in fact 1 since MS bit determines the *sign*) – L.B Aug 17 '12 at 19:51
By the way, do you think changing the `int[]` to `byte[]` will help by reducing an implicit cast or would the array indexer change it to an `int` anyways? – Raheel Khan Aug 17 '12 at 19:53
@RaheelKhan I tested in both ways and didn't see much change in speed. I left it as `int` since, maybe, accessing the nums on the computer's word boundries can be faster. – L.B Aug 17 '12 at 19:55
@MooingDuck In fact, it is precalculation of number of leading zeros not precalculation the log2 – L.B Aug 17 '12 at 20:04
Thanks again. Indeed converting to byte doesn't improve anything. I just bench-marked your function against the conventional BigIntegr.Log 10,000 times on a number with 4,500,000 bits and the results are 3.41 and 48.47 seconds respectively. ronalchn's function is pretty close but not quite. Cheers! – Raheel Khan Aug 17 '12 at 20:24
It depends heavily on the cache size of your computer, and what you do. Just using this function, if you have a level 1 cache of at last 1 KB, storing 256 4-byte integers won't matter. Storing them as bytes will use less cache size. However, if you also use other functions which might use some of your cache, the CPU might have to reload more of the memory into the cache. My solution uses much less cache size, so will be less likely to cause cache misses. It also causes less initialization time (less data in memory). – ronalchn Aug 17 '12 at 23:05

score 2 · Answer 3 · answered Jun 10 '22 at 03:09

Years late but maybe this will help someone else...

.Net Core 3 added the .GetBitLength() that is basically log2. (but just one increment too high) Since it is built-in to .net I think this is about as fast as we can get.

// Create some example number
BigInteger myNum= (BigInteger)32;

// Get the Log2
BigInteger myLog2 = myNum.GetBitLength() - 1;

https://dotnetfiddle.net/7ggy4D

Log of a very large number

3 Answers3

Linked