4

At the risk of having this question voted as a duplicate, or even to have it closed, I had this question has come up.

Background

In "normal" data types such as int, long long, etc..., to convert from the binary numeric value to a decimal string, you would do the following (in pseudo code):

Set length = 0
Set divisor to largest base10 value the data type will hold (Divisor).
  Loop
    Divide number in question by divisor.
    Place result in a string at position length.
    Increment the length by 1.
    Divide the divisor by 10.
Reverse the string.
Print the string.

The actual implementation in (most) any language is quite trivial.

The Problem

The issue that I am encountering with the above method is that with big integer numbers (also known as arbitrary precision arithmetic), there is no largest base 10 value to start with. So the question is "How do you initialize the divisor to the largest possible base10 value if there is no way to know what that value is?"

What I Have Tried

Still trying to draft a solution.

Research

Some of the links that I have found here include the following:

Convert a "big" Hex number (string format) to a decimal number (string format) without BigInteger Class

C: print a BigInteger in base 10

Fastest way to convert a BigInteger to a decimal (Base 10) string?

Convert a "big" Hex number (string format) to a decimal number (string format) without BigInteger Class

A Google search turned up other things, but nothing that specifically answers my question.

Ideas

One method that I think that might work is as follows (in pseudo code):

Define p_divisor as previous divisor.
Set divisor = 1
  Loop:
    if divisor < dividend
      then
        Set p_divisor = divisor
        divisor = divisor * 10
      else
        end loop
  Loop:
    Divide number in question by divisor.
    Place result in a string at position length.
    Increment the length by 1.
    Divide the divisor by 10.
    if divisor == 1 then end loop
Reverse the string.
Print the string.

Would this be the correct way? I have a big integer library up and working (including multiplication and division) so it wouldn't be that hard to pull this off. The big issue that I see with this method is performance, because you have to run a multiplication sequence to get the initial divisor, then you have to divide twice for each base10 position. One for the actual division, and the other for the divisor.

Community
  • 1
  • 1
Daniel Rudy
  • 1,411
  • 12
  • 23
  • 2
    I'd repeatedly divide by 10, generating digits from the right. At the end, reverse the string. – melpomene Jun 04 '16 at 15:52
  • 3
    Rather than dividing by 10, then 100, then 1000, etc. just divide by 10 every time, and keep the remainder as your next digit -- and keep going until your dividend is 0. (and either reverse the string at the end or just build it in reverse in the first place -- if you can figure out how many bits are actually used in your number to start with, you can estimate the number of decimal digits you'll need fairly closely). – Dmitri Jun 04 '16 at 16:08
  • I think you mean dividing by 1000, then 100, then 10. I think I see where you are going with this. The remainder is the digit that gets placed in the string while the result is the new dividend. That's actually a good idea. Because the divisor is 10, the high-speed one-word divisor algorithm will burn through that in nothing flat. Additionally, since I am filling the string from the beginning, I don't think the string reversal step is needed. This should be an answer, not a comment. Thanks. – Daniel Rudy Jun 04 '16 at 16:28
  • 2nd method does not work in C. as `if divisor < dividend then Set p_divisor = divisor divisor = divisor * 10` will overflow whatever integer type used for large `dividend` – chux - Reinstate Monica Jun 04 '16 at 21:49

4 Answers4

5

One (fairly common) way to do this, whether for big integer or normal integer types, is to repeatedly divide the number by 10, saving the remainder as the next digit (starting with the least significant). Keep going until the number reaches zero. Since the first digit found is the least significant, you may need to reverse the string at the end, or build it in reverse as you go.

An example using ordinary unsigned int might look like:

void printUInt(unsigned x) {
  char buf[(sizeof(x) * CHAR_BIT) / 3 + 2]; // slightly oversize buffer
  char *result  = buf + sizeof(buf) - 1; // index of next output digit

  // add digits to result, starting at 
  //   the end (least significant digit)

  *result = '\0'; // terminating null
  do {
    *--result = '0' + (x % 10);  // remainder gives the next digit
    x /= 10;
  } while (x); // keep going until x reaches zero

  puts(result);
}

The process is pretty much the same for a big integer -- though it would be best to do the division and find the remainder in one step if you can.

The above example builds the string from the end of the buffer (so result ends up pointing in the middle of the buffer somewhere), but you could also build it from the start and reverse it afterward.

You can estimate the size needed for the output if you can determine the number of bits used in your original number (about 1 additional digit per 3 bits -- slightly less).

Dmitri
  • 9,175
  • 2
  • 27
  • 34
  • This does appear to be the answer. I will wait a couple of days to see i anyone else responds before I mark this as the answer. I am coding it now. From what I have found, it seems the number of decimal digits to bits is 5 digits for every 16-bits, which seems to work. I checked that conversion through 2^208 and it still works. I may edit the original post to add that little tidbit. – Daniel Rudy Jun 04 '16 at 20:39
  • Minor: Could use `sizeof buf` rather than `sizeof (buf)` 2) Rather than `/3`, could use `*28/93` or `*87/289` to approximate `log10(2)`. So `char buf[sizeof x * CHAR_BIT) *28/93 + 2]; // right size buffer up to 92 bit` – chux - Reinstate Monica Jun 04 '16 at 21:45
2

The accepted answer already provides you with a simple way to do this. That works fine and gives you a nice result. However, if you really need to convert large values to a string, there is a better way.

I will not go into details, because my solution is written in Delphi, which many readers can't easily read, and it is pretty long (several functions in 100+ lines of code, using yet other functions, etc. which can not be explained in a simple answer, especially because the conversion handles some different number bases differently).

But the principle is to divide the number into two almost equal size halves, by a number which is a power of 10. To convert these, recursivley cut them in two smaller parts again, by a smaller power of 10, etc. until the size of the parts reaches some kind of lower limit (say, 32 bit), which you then finally convert the conventional way, i.e. like in the accepted answer.

The partial conversions are then "concatenated" (actually, the digits are placed into the single buffer at the correct address directly), so at the end, you get one huge string of digits.

This is a bit tricky, and I only mention it for those who want to investigate this for extremely large numbers. It doesn't make sense for numbers with fewer than, say, 100 digits.

This is a recursive method, indeed, but not one that simply divides by 10.

The size of the buffer can be precalculated, by doing something like

bufSize = myBigInt.bitCount() * Math.log10(2) + some_extra_to_be_sure;

I use a precalculated table for the different number bases, but that is an implementation detail.

For very large numbers, this will be much faster than a loop that repeatedly divides by 10, especially since that way, the entire number must be divided by 10 all the time, and it only gets smaller very slowly. The divide-and-conquer algorithm only divides ever smaller numbers, and the total number of (costly) divisions to cut the parts is far lower (log N instead of N, is my guess). So fewer divisions on (on the average) much smaller numbers.

cf. Brent, Zimmermann, "Modern Computer Arithmetic", algorithm 1.26

My code and explanations can be found here, if you want to see how it works: BigIntegers unit

Rudy Velthuis
  • 28,387
  • 5
  • 46
  • 94
  • logN is the depth of your recursion tree, the number of divisions is still O(N), but the constant might well be much smaller in this approach. – Slava Aug 04 '16 at 14:09
  • I do know that for very large numbers, it is **much** faster, because the number of *costly* divisions (with large numbers) is greatly reduced. The naive algorithm constantly divides a very slowly reducing number by 10. So the total number of divisions may still be O(n), but most of them divide much smaller numbers. – Rudy Velthuis Aug 04 '16 at 16:41
0

I came across similar problem and did not find any solution to my liking, so came up with my owm. The idea is to convert your BigInt using whatever base to another BigInt with the base of power of 10, as large as possible but still smaller then your current base. That you can just convert by "digit" using system calls, and concatenate the result. So no explicit division ever involved, only hidden in system library functions. Still the overall complexity is quadratic (just like with the other division based solutions).

friend std::ostream& operator<<(std::ostream& out, const BigInt_impl& x){
    using Big10 = BigInt_impl<char32_t, uint64_t, 1000000000>; // 1e9 is the max power of 10 smaller then BASE
    auto big10 = Big10(0);
    auto cm = Big10(1);
    for(size_t i = 0; i < x.digits.size(); ++i, cm *= BASE){
        big10 += cm*x.digits[i];
    }
    out << big10.digits.back();
    for(auto it = next(big10.digits.rbegin()); it != big10.digits.rend(); ++it){ 
        out << std::setfill('0') << std::setw(9) << *it;
    }
    return out;
}

Watch out for the magic constant 1e9 in this solution - this is just for my case of BASE = 2^32. Was lazy to do it properly.

(and sorry, for C++, I just realized that qustion was about C, but still would like to leave the code here, maybe as an illustration of idea)

Slava
  • 1,528
  • 1
  • 15
  • 23
-1

Would this be the correct way?

2nd method does not work for all integer values in C. if divisor < dividend relies on creating divisor as a power of 10 greater (or equal) than the dividend. Since most integer systems have a finite range, creating a power of 10 greater (or equal) than dividend when dividend == INTEGER_MAX is not possible. (unless INTEGER_MAX is a power of 10).


A recursive method works by performing repeated division by 10 and deferring the the digit assignment until the more significant digits are determined. This approach works well when the size of the destination buffer is unknown, yet adequate.

The below handles signed int and works for INT_MIN too without undefined behavior.

// Return location of next char to write
// Note: value is expected to be <= 0
static char *itoa_helper(char *s, int value) {
  if (value/10) {
    s = itoa_helper(s, value/10);
  }
  *s = '0' - value % 10;  // C99
  return s+1;
}

void itoa(int n, char *s) {
  if (n < 0) {
    *s++ = '-';
  } else {
    n = -n;
  }
  *itoa_helper(s, n) = '\0';
}

#define INT_SIZEMAX  ((CHAR_BIT*sizeof(int) - 1)*28/93 + 3)
char buf[INT_SIZEMAX];
itoa(INT_MIN, buf);

Rather than converting negative numbers to positive ones, this code does the opposite as -INT_MIN fails on most systems.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • I don't like recursive methods. I have implemented one that doesn't use recursion. – Daniel Rudy Jun 05 '16 at 07:26
  • Use the right tool for the job. On the occasions where recursion is the best approach, bias against it results in using an axe for a saw or unfamiliarity in using the tool correctly. – chux - Reinstate Monica Jun 05 '16 at 19:01
  • 1
    Is recursion the right tool for the job here? I don't think so. This can be done in a simple loop. – Rudy Velthuis Jun 06 '16 at 08:57
  • Of course this can be done in a loop. yet a loop requires a prior knowledge of the maximum length of the string (other answer) or a subsequent loop to re-copy (posted algorithm). As mentioned in this answer: "works well when the size of the destination buffer is unknown, yet adequate." Depending on the needs of memory management of the string, stack space, code space (not stated in post), recursion can be the best answer. – chux - Reinstate Monica Jun 06 '16 at 15:49
  • It is very easy to determine the length of the string. Simply count the bits (that's easy) and multiply that with log10(2). Allocate a few digits more and you're safe. It is how I do it in the simple version of toString in my own BigInteger (in Delphi). The less simple way uses a divide and conquer algorithm, but the calculation of the buffer size is the same. It is even accurate for BigIntegers of several million digits. – Rudy Velthuis Jun 11 '16 at 14:37
  • @Rudy Velthuis Posting your comment as an answer would allow votes and feedback. Makes more sense there than as a comment to a recursive approach. – chux - Reinstate Monica Jun 11 '16 at 18:17
  • Considering that every 5 digits can be represented at 16-bits, 10 digits in 32-bits, and 20 digits in 64-bits, I have opted to use this conversion factor when converting between a bigint and a decimal string. When I do the conversion, I know what the length of the bigint it (at least the array allocation) so the conversion is easy. Then add 4 to the string size of safety and good to go. I have tested this and it works. Works the other way too. – Daniel Rudy Jun 18 '16 at 08:39
  • I tend to shy away from recursion due to stack space limitation. On the platform that I write software on, the stack is set for 8MB which isn't much, so I have learned to write algorithms using an iterative approach vs. a recursive approach. I am not saying that it is not useful, just personal preference. – Daniel Rudy Jun 18 '16 at 08:42
  • @Daniel Rudy Recursion here is well bounded, at most, about `log10(INT_MAX)` iterations. – chux - Reinstate Monica Jun 20 '16 at 01:16