Binary to decimal (on huge numbers)

Question

I am building a C library on big integer number. Basically, I'm seeking a fast algorythm to convert any integer in it binary representation to a decimal one

I saw JDK's Biginteger.toString() implementation, but it looks quite heavy to me, as it was made to convert the number to any radix (it uses a division for each digits, which should be pretty slow while dealing with thousands of digits).

So if you have any documentations / knowledge to share about it, I would be glad to read it.

EDIT: more precisions about my question:

Let P a memory address
Let N be the number of bytes allocated (and set) at P

How to convert the integer represented by the N bytes at address P (let's say in little endian to make things simpler), to a C string

Example:

N = 1
P = some random memory address storing '00101010'
out string = "42"

Thank for your answer still

Post an example of your "number in it binary representation" — chux - Reinstate Monica, Apr 16 '16 at 13:40
Be more clear in what exactly you want to convert to what. "binary representation" as a string to integer like the question implies? Or integer to string like the function name implies ? or ...? — Unimportant, Apr 16 '16 at 13:45
Not sure if it is more efficient but you can google for "double dabble" algorithm converting binary to BCD. — Marian, Apr 16 '16 at 13:52
Consider converting your binary to hex (which is simple just map each 4 bits to hex digit) and then convert to dec it is usually much faster. see [hex 2 dec string conversion](http://stackoverflow.com/a/18231860/2521214) Do not forget to preallocate destination string to avoid reallocations which are SLOW. — Spektre, Apr 17 '16 at 05:29
Thank you Spektre, I'll implement the hex solution, and the double dale algorythm, and perform some heavy tests / comparison — Romain Pereira, Apr 17 '16 at 17:10

Stefan Haustein · Accepted Answer · 2016-04-17T17:16:51.770

6

The reason for the BigInteger.toString method looking heavy is doing the conversion in chunks.

A trivial algorithm would take the last digits and then divide the whole big integer by the radix until there is nothing left.

One problem with this is that a big integer division is quite expensive, so the number is subdivided into chunks that can be processed with regular integer division (opposed to BigInt division):

static String toDecimal(BigInteger bigInt) {
  BigInteger chunker = new BigInteger(1000000000);
  StringBuilder sb = new StringBuilder();
  do {
    int current = bigInt.mod(chunker).getInt(0);
    bigInt = bigInt.div(chunker);
    for (int i = 0; i < 9; i ++) {
      sb.append((char) ('0' + remainder % 10));
      current /= 10;
      if (currnet == 0 && bigInt.signum() == 0) {
        break;
      }
    }
  } while (bigInt.signum() != 0);
  return sb.reverse().toString();
}

That said, for a fixed radix, you are probably even better off with porting the "double dabble" algorithm to your needs, as suggested in the comments: https://en.wikipedia.org/wiki/Double_dabble

edited Apr 17 '16 at 17:16

answered Apr 16 '16 at 14:36

Stefan Haustein

18,427
3
36
51

Thank to you two, that's what I was looking for, and sorry for the others who didn't understand what I was asking. I have edited the question to be more clear :) – Romain Pereira Apr 17 '16 at 17:05
1

Actually, the best way to do this is using a divide-and-conquer algorithm, recursively dividing chunks into smaller chunks until the single chunks become small enough to be processed in a more conventional way. That saves a lot of divisions. My (Delphi) BigInteger.ToString() works that way, and it has become much faster than the naive approach. – Rudy Velthuis Apr 17 '16 at 23:26
Good point! But it depends on the sizes involved... gmp only switches to divide-and-conquer when the size exceeds a certain threshold; it is instructive to peruse their overview of division algorithms. Of course, radix conversion benefits from templating the code on the radix, since decent compilers will then implement division by the (constant) radix powers via multiplication with the inverse. At the bare metal level another speedup lies in converting/outputting two digit characters at a time, at least in languages like C# where one has to fight with one hand tied behind one's back... – DarthGizka Apr 27 '16 at 16:40
[Division algorithms employed by GMP](https://gmplib.org/manual/Division-Algorithms.html#Division-Algorithms) – DarthGizka Apr 27 '16 at 16:41

score 0 · Answer 2 · answered Dec 25 '20 at 04:49

I recently got the challenge to print a big mersenne prime: 2**82589933-1. On my CPU that takes ~40 minutes with apcalc and ~120 minutes with python 2.7. It's a number with 24 million digits and a bit.

Here is my own little C code for the conversion:

// print 2**82589933-1

#include <stdio.h>
#include <math.h>
#include <stdint.h>
#include <inttypes.h>
#include <string.h>

const uint32_t exponent = 82589933;
//const uint32_t exponent = 100;
//outputs 1267650600228229401496703205375
const uint32_t blocks = (exponent + 31) / 32;
const uint32_t digits = (int)(exponent * log(2.0) / log(10.0)) + 10;

uint32_t num[2][blocks];
char out[digits + 1];

// blocks : number of uint32_t in num1 and num2
// num1   : number to convert
// num2   : free space
// out    : end of output buffer
void conv(uint32_t blocks, uint32_t *num1, uint32_t *num2, char *out) {
  if (blocks == 0) return;
  const uint32_t div = 1000000000;
  uint64_t t = 0;
  for (uint32_t i = 0; i < blocks; ++i) {
    t = (t << 32) + num1[i];
    num2[i] = t / div;
    t = t % div;
  }
  for (int i = 0; i < 9; ++i) {
    *out-- = '0' + (t % 10);
    t /= 10;
  }
  if (num2[0] == 0) {
    --blocks;
    num2++;
  }
  conv(blocks, num2, num1, out);
}

int main() {
  // prepare number
  uint32_t t = exponent % 32;
  num[0][0] = (1LLU << t) - 1;
  memset(&num[0][1], 0xFF, (blocks - 1) * 4);
  // prepare output
  memset(out, '0', digits);
  out[digits] = 0;
  // convert to decimal
  conv(blocks, num[0], num[1], &out[digits - 1]);
  // output number
  char *res = out;
  while(*res == '0') ++res;
  printf("%s\n", res);
  return 0;
}

The conversion is destructive and tail recursive. In each step it divides num1 by 1_000_000_000 and stores the result in num2. The remainder is added to out. Then it calls itself with num1 and num2 switched and often shortened by one (blocks is decremented). out is filled from back to front. You have to allocate it large enough and then strip leading zeroes.

Python seems to be using a similar mechanism for converting big integers to decimal.

Want to do better?

For large number like in my case each division by 1_000_000_000 takes rather long. At a certain size a divide&conquer algorithm does better. In my case the first division would be by dividing by 10 ^ 16777216 to split the number into divident and remainder. Then convert each part separately. Now each part is still big so split again at 10 ^ 8388608. Recursively keep splitting till the numbers are small enough. Say maybe 1024 digits each. Those convert with the simple algorithm above. The right definition of "small enough" would have to be tested, 1024 is just a guess.

While the long division of two big integer numbers is expensive, much more so than a division by 1_000_000_000, the time spend there is then saved because each separate chunk requires far fewer divisions by 1_000_000_000 to convert to decimal.

And if you have split the problem into separate and independent chunks it's only a tiny step away from spreading the chunks out among multiple cores. That would really speed up the conversion another step. It looks like apcalc uses divide&conquer but not multi-threading.

Binary to decimal (on huge numbers)

2 Answers2