10

I have a very big number, on the order of a thousand decimal digits, and I have to convert this to its binary representation. The numbers are stored as strings.

Since few languages have a basic data type to handle numbers this big, I see no easy way to turn this into an integral value for which I could convert it.

Could someone please help me out here? What would be a viable approach for doing this?

Boann
  • 48,794
  • 16
  • 117
  • 146
frodo
  • 1,561
  • 3
  • 21
  • 36
  • Think about the relation between 10^x and 2^x for a moment... – Junuxx Jun 13 '12 at 00:56
  • I'm not sure if there is any easy way. In Java BigInteger source code, it will convert several digits to integer and "accumulate" it into the binary representation by doing a multiplication and addition on big integer (i.e. you have to write multiplication + addition operations on big integer before you can do the string conversion). – nhahtdh Jun 13 '12 at 01:00
  • 1
    @mattegod. gee sorry if u feel that way. i've figured it out anyway. Thanks for all the great help. :P `code`int string_to_binary(int bin[],char s[]) { //int len = strlen(s); int res = 0; int bits = 0; int digits; for(digits = 0; s[digits] ; digits++ )s[digits] -= '0'; int firstdigit = 0; int remain; int i; while(firstdigit < digits) { for( i = firstdigit , remain = 0; i < digits ; i++) { remain = 10*remain + s[i]; s[i] = remain/2; remain %= 2; //printf("%d ",s[i]); } bin[bits++] = remain; while((firstdigit < digits) and !s[firstdigit]) firstdigit++; } return bits; } – frodo Jun 13 '12 at 01:57
  • 1
    @frodo - That doesn't look like a solution to the problem *that you stated*. For a start, it returns an `int` ... and the there's no way that `10^1000` fits in an `int`. – Stephen C Jun 13 '12 at 07:44
  • @StephenC. I am returning the number of bits (bin[bits++]). it is merely something i used later on. bin[] contains the binary number. You could try out this code, it works fine. – frodo Jun 13 '12 at 09:34

1 Answers1

55

If this is a genuine problem, there are plenty of BigNum libraries out there to assist, such as the MPIR library.

If it's something where you can't use a third-party library, it's still relatively easy. You don't actually need a complex BigNum library for this, you only need one operation: divide by two.

Here's how you do it. Start with an empty stack of binary digits. Then loop until the number is "0" (yes, that's still a string). If the last digit of the number is odd, push 1 on to the stack, otherwise push 0. Then divide the number by two and restart the loop.

Once the loop is finished (number is "0"), pop the digits off the stack one at a time and print them. There you go.


Oh, yeah, the divide-by-two, that is a rather important piece of the puzzle :-)

Let's start with "12345". Here's the process you follow, in pseudo-code.

Set next_additive to 0.
For every digit in number (starting at the left):
    Set additive to next_additive.
    If the digit is odd, set next_additive to 5, else set it to 0.
    Divide the digit by two (truncating) then add additive.
Remove leading zero if necessary (if it starts with 0 but is not just 0).

This can be done by processing the actual string one character at a time.

  • Starting with 1 (from 12345), additive is 0, number is odd, so next_additive is 5. Divide 1 by 2 and add additive of 0, you get 0: 02345.

  • Next digit 2, additive is 5, number is even, so next_additive is 0. Divide 2 by 2 and add additive of 5, you get 6: 06345.

  • Next digit 3, additive is 0, number is odd, so next_additive is 5. Divide 3 by 2 and add additive of 0, you get 1: 06145.

  • Next digit 4, additive is 5, number is even, so next_additive is 0. Divide 4 by 2 and add additive of 5, you get 7: 06175.

  • Next digit 5, additive is 0, number is odd, so next_additive is 5. Divide 5 by 2 and add additive of 0, you get 2: 06172.

  • Strip off leading zeros: 6172. Ignore the next additive since you're truncating the result.

And there you have it: 12345 / 2 = 6172.


By way of example, here's a Python approach to implementing this algorithm as follows. First the support routine for checking if a string-number is odd (keep in mind this isn't meant to be Pythonic code, it's just to show how it could be done - there's almost certainly better ways to do this in Python but that won't necessarily map well to another language):

def oddsToOne(s):
    if s.endswith('1'): return 1
    if s.endswith('3'): return 1
    if s.endswith('5'): return 1
    if s.endswith('7'): return 1
    if s.endswith('9'): return 1
    return 0

Then another support routine for dividing a string-number by two:

def divByTwo(s):
    new_s = ''
    add = 0

    for ch in s:
        new_dgt = (ord(ch) - ord('0')) // 2 + add
        new_s = '%s%d' % (new_s, new_dgt)
        add = oddsToOne(ch) * 5
    
    if new_s != '0' and new_s.startswith('0'):
        new_s = new_s[1:]
    
    return new_s

And, finally, some actual code to make a binary string from the decimal string:

num = '12345'

if num == '0':
    stack = '0'
else:
    stack = ''
    while num != '0':
        stack = '%d%s'%(oddsToOne(num), stack)
        num = divByTwo (num)

print(stack)

Note that if you wanted to actually use this to populate real bits (rather than make a string of bits), it's a simple matter to change what happens in the if and else clauses.

As stated, it's probably not the most efficient or beautiful Python code you could come up with but it's simply meant to show the process, not be some well-engineered production-ready piece of code. The output is (with some added stuff below to show what's going on):

12345
11000000111001
||      |||  |
||      |||  +-     1
||      ||+----     8
||      |+-----    16
||      +------    32
|+-------------  4096
+--------------  8192
                =====
                12345

Because this works on the string representation of the numbers, there is no arbitrary numeric limit such as the size of a 64-bit integer. Some example values are (slightly reformatted into 32-digit chunks for readability):

123456781234567812345678
=> 11010001001001001101100000011011
   01110110001110110010110110101111
   0111101001110

99999999999999999999999999999999
99999999999999999999999999999999
99999999999999999999999999999999
9999
=> 10010010010011010110100100101100
   10100110000110111110011101011000
   01011001001111000010011000100110
   01110000010111111001110001010110
   01110010000001000111000100001000
   11010011111001010101010110010010
   00011000010001010100000101110100
   01111000011111111111111111111111
   11111111111111111111111111111111
   11111111111111111111111111111111
   1111111111111
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Do you have any idea to convert the binary back to decimal string? – ivenxu Mar 21 '14 at 09:36
  • this wont work if the input decimal string is very large. Because if the decimal string is very large, it cannot be represented as a number and therefore it cannot be divided by 2. try this: "123456781234567812345678". – derek Mar 07 '16 at 17:04
  • 4
    @derek, *all* operations are performed on strings in this answer (see my text `yes, that's still a string`) so there is *no* arbitrary numeric limit placed on the values. The only limit is storage space. In fact, I'll show the output for your exact case (and a much bigger one). – paxdiablo Mar 08 '16 at 04:11
  • 1
    You are right. I did not notice you are operating on strings. Good solution! – derek Mar 08 '16 at 04:35
  • There is missing one small thing: if num == '0': stack = '0' – Jan Jůna Oct 30 '16 at 09:23
  • Good point, @JanJůna, added that in. – paxdiablo Oct 30 '16 at 12:30
  • I think your pseudocode is missing the use of `additive`. – Kerrek SB Apr 17 '17 at 23:00
  • @KerrekSB, good catch, I appear to have left it out of the pseudo-code yet still use it in the steps - have added that in. – paxdiablo Apr 18 '17 at 03:26
  • Is there full documentation to this algorithm? I can't seem to find one online... – Doot Jun 04 '20 at 18:04
  • This is brilliant! I needed to convert a long decimal string to a bunch of int64s and this method worked beautifully! – poby Jun 24 '20 at 06:19