4

Given a big decimal integer, how do I compute the bit-length, i.e. the number of digits of its binary representation?

Example: bitlength("590295810358705712624") == 70

The arithmetic expression is: bitlength = ⌊log₂(n)⌋ + 1

For small integers, this expression can be translated to standard library calls. But what about big integers with an arbitrary number of digits?

We could compute a very close estimate of log₂ from one or two leading digits:

log₂(8192) ≥ log₂(8100) = log₂(81) + log₂(10) * 2 = 12.98...

Plugging this into the arithmetic expression above, we get a very good lower bound for the bit-length. But in some cases, we have to inspect more digits, potentially up to the least significant one, in order to get an exact result:

bitlength("1125899906842623") == 50
bitlength("1125899906842624") == 51

Any suggestions on how to compute the bit-length exactly and efficiently in all cases?

le_m
  • 19,302
  • 9
  • 64
  • 74
  • Why not use traditional method of dividing by 2 until number reduces to 0? – vish4071 Mar 22 '17 at 17:08
  • Do you want the result in constant time? – vish4071 Mar 22 '17 at 17:10
  • @vish4071 Dividing by 2 till we arrive at 0 is a valid approach, but the above insights suggest we could come up with a faster solution for most cases by inspecting only a certain number of most significant digits. – le_m Mar 22 '17 at 17:14
  • `⌊log₂(n)⌋ + 1` is **wrong**, its actually `⌈log₂(n)⌉` – specializt Mar 22 '17 at 17:18
  • 3
    @specializt Care to explain? Isn't `⌈log₂(8)⌉ == 3` but `bitlength("8") == 4`? – le_m Mar 22 '17 at 17:24
  • well theres not much to explain here - 3 bits can represent exactly 8 different decimal numbers. Its pretty basic math, really - you just need to include the decimal symbol `0` – specializt Mar 22 '17 at 17:36
  • @specializt Yes, 3 bits can represent 8 different numbers. But that's not what I am looking for. The bit-length is the number of binary *digits* needed to represent a number. 8 in binary is 1000, which has *4* digits – le_m Mar 22 '17 at 17:39
  • @Specializt: It depends. le_m is right about 8 and any other power of 2. Just count the limbs below the top one, multiply by the size of a limb (in bits, e.g. 32), and calculate the bitlength of the top limb. I think Java has functions for that. Add the bitlength of the top limb to the value calculated on the lower limbs. Negative values also need a special treatment (decrement by one if a power of two). – Rudy Velthuis Mar 24 '17 at 19:21
  • @le_m, if you can read Pascal a bit, take a look at my BigInteger implementation. It also contains assembler, but that can be ignored (there is a plain Pascal implementation for each method too). https://github.com/rvelthuis/BigNumbers/blob/master/Source/Velthuis.BigIntegers.pas – Rudy Velthuis Mar 24 '17 at 19:39
  • @RudyVelthuis Thanks, I'll do that - actually programmed a lot in Turbo Pascal including assembler for those BIOS graphic modes... – le_m Mar 24 '17 at 19:41
  • Then it won't be too hard to read. 32 bit assembler is slightly different, but not that much. And the PUREPASCAL parts explain what the assembler does. – Rudy Velthuis Mar 24 '17 at 19:44

4 Answers4

2

Computing the lower and upper bound of the approximate logarithm and accepting the result when both are equal, we can finish our computation in more than 90% of all cases in one step. In all other cases, we divide the input by two and repeat recursively.

Given an input with n digits and assuming that division by 2 is in O(n), the runtime complexities are as follows:

  • O(1) in the best case
  • O(n) on average
  • O(n²) in the worst case

I still think that the average and worst case runtime complexity can be improved.

Below is an exemplary implementation of the described algorithm:

// Divide n by 2:
function half(n) {
  let shrink = n[0] > 1 ? 0 : 1;
  let result = new Array(n.length - shrink);
  let carry = 0.5 * shrink;
  for (let i = 0; i < result.length; ++i) {
    let d = n[i + shrink] * 0.5 + carry * 10;
    carry = d % 1;
    result[i] = Math.floor(d);
  }
  return result;
}

// Compute bit-length of n:
function bitlength(n) {
  if (n.length < 2) return Math.floor(Math.log2(n[0]            )) + 1;
  if (n.length < 3) return Math.floor(Math.log2(n[0] * 10 + n[1])) + 1;

  let lower = Math.floor(Math.log2(n[0] * 10 + n[1]    ) + Math.log2(10) * (n.length - 2));
  let upper = Math.floor(Math.log2(n[0] * 10 + n[1] + 1) + Math.log2(10) * (n.length - 2));
  if (lower === upper) return lower + 1;

  return bitlength(half(n)) + 1;
}

// Example:
console.log(bitlength([5,9,0,2,9,5,8,1,0,3,5,8,7,0,5,7,1,2,6,2,4]));

Above implementation can be optimized, e.g. using lookup tables. However, in order to improve the runtime complexity, we would need to come up with a better solution to dividing n by 2. Feel free to comment.

le_m
  • 19,302
  • 9
  • 64
  • 74
  • Dividing a large BigInteger by 2 each time is pretty slow. Also no need to use a log2(). Just calculate the bit length of the top limb, and multiply the number of limbs below that with the bitsize (e.g. 32). Add those two numbers, and you're fine. That should be **a lot** faster than anything above. There are simple algorithms to calculate the bitlength of a 32 bit integer. – Rudy Velthuis Mar 24 '17 at 19:27
  • @RudyVelthuis Thanks for your input! You are right, the division is problematic. What does 'limbs' refer to - digits? – le_m Mar 24 '17 at 19:34
  • @RudyVelthuis Ok, [found the definition](http://stackoverflow.com/questions/29710267/limb-in-the-vocabulary-of-arbitrary-precision-integer). Will have a look. – le_m Mar 24 '17 at 19:47
  • In my implementation, each limb is an unsigned 32 bit integer and a big integer is simply an array of such UInt32s. In other words, each limb represents 32 bits of the total value, instead of a single decimal digit. That allows me to have pretty compact big integers. This what Java does too, BTW. You could see a limb as a huge digit, base 2^32. – Rudy Velthuis Mar 24 '17 at 19:52
  • FWIW, the word "limb" comes from Donald Knuth, AFAIK. If you want to do division or multiplication of large integers, read his books (but they are not cheap), or look for a simplified version of his code, called `divmnu.c` My BigInteger also uses the Burnikel-Ziegler algorithm, but that is for very large integers only. – Rudy Velthuis Mar 24 '17 at 19:57
0

I once wrote an answer to how you convert an arbitrary sized number given as a string into a string of hexadecimal digits: https://stackoverflow.com/a/2653006/64121

You could very easily adapt that into a binary string and take the length of it (or just calc the length directly). An even lazier way would be to keep the hexadecimal version and count all digits but the first one as four biniray digits and finally examine the exact bit length of the first digit.

Community
  • 1
  • 1
Dan Byström
  • 9,067
  • 5
  • 38
  • 68
  • A valid but seemingly slow approach, would be in O(n²) for a number with n digits, right? – le_m Mar 22 '17 at 17:19
  • That would be correct in its given form. But with a little smarter, real life coding, we could build the result in ulongs instead of bytes, and then the inner loop would iterate far fewer times (/8). – Dan Byström Mar 22 '17 at 17:41
0

This is how I do it in my BigInteger implementation:

result = (number_of_limbs - 1) * 32 + Integer.bitlength(top_limb);

You should know the number of limbs. In my case, they are 32 bit unsigned integers. Just the bits of the top limb needs to be handled, but a simple algorithm like below will do that for you:

if (n > 65535)
{
    result += 16;
    n >>= 16;
}
if (n > 255)
{
    result += 8;
    n >>= 8;
}
etc...

You must probably adjust for negative numbers, but that is quite simple: decrement the number you found by 1 if the top limb is a power of two.

Rudy Velthuis
  • 28,387
  • 5
  • 46
  • 94
  • Thanks for your contribution. It seems to handle big integers whose parts (limbs) are already stored in binary, thus looking at the most significant bit(s) seems like the way to go. Now consider a decimal input (e.g. a string "1234") which has not yet been converted to binary. We *could* convert it to binary and use your method to count the bit-length. But converting to binary is in itself an expensive operation which could be avoided by working purely with the decimal representation. – le_m Mar 24 '17 at 19:57
  • E.g. your BigNumber library's `InternalParseDecimal` performs n `InternalMultiplyAndAdd32` operations for an n-digit decimal, which in total is probably >= O(n²). You estimate the bit-length of a decimal by `Value.MakeSize(Length(S) div CStringMinLengths[Base] + 4);` which is pretty close to the exact result already. Getting the exact result directly from just the decimal digits is what I am looking for. – le_m Mar 24 '17 at 20:36
  • Sure, you could work with the decimal representation, but if you want to multiply, divide, add or subtract, working with 32 bit unsigned integers is a lot faster. Note that I did not convert the decimal to internal code yet, but I already converted the internal to decimal (or any base) code, and that uses a divide-and-conquer algorithm. Creating a BigInteger out of decimal digits is going to be very slow for large operations. Just note that Java also uses 32 bit integers (and they are treated as unsigned integers, internally). Java's BigInteger is well optimized. – Rudy Velthuis Mar 24 '17 at 21:12
  • I agree, a real world use-case usually involves more than just counting the bit-length. But this question didn't arise from a real-world use case but a theoretical exercise while thinking about an answer to [this question](http://stackoverflow.com/questions/42915607/javascript-big-integer-from-string-to-base-2). – le_m Mar 24 '17 at 21:22
  • Well, the answer to that question also uses a bigint library to turn the string into a big int, right? – Rudy Velthuis Mar 24 '17 at 21:25
  • Yes, and the question has been modified to include `assuming that it will be converted to base 2 number representation`. Still, it was the inspiration to my question here. – le_m Mar 24 '17 at 21:29
  • It is not possible to get the bit length from a decimal representation directly. You will have to convert to single bits or a larger set of bits (hex, bytes, integers, etc.) You can estimate it, but only roughly, if you know that a 32 bit integer can hold 9 or 10 digits (and assume the worst). – Rudy Velthuis Mar 24 '17 at 21:33
  • Correct, we can estimate the number of bits (by looking at the most significant decimal digit(s)). And in most cases, the estimate is already exact! So if we are lucky, we can finish in O(1). Only in the worst case, if our input is a power of 2, we have to continue the whole conversion. Anyway, that is my best answer to this problem so far. – le_m Mar 24 '17 at 21:40
  • @le_m: if you can find a way, please share it. I don't think it is that easy. – Rudy Velthuis Mar 25 '17 at 04:46
0

If only the approximate number of bits is needed for a given BigDecimal n, use unscaledValue() to access the number represented as a BigInteger without the scale factor, then access the length of its associated byte array:

int bytes = n.unscaledValue().toByteArray().length;
int bits = bytes * 8;

This will yield the upper bound for the required number of bits.

aaroncarsonart
  • 1,054
  • 11
  • 27