Preface
I'm learning about computer math by writing and refining my own BigInt library. So far my first incarnation stores every digit of a base 10 number in successive elements of a vector. It can multiply and add with arbitrary precision. I want to speed it up by using all of the space available to me in a standard C++ data type by converting to base 2^x.
Information
I am reading 1000 or more digits from stdin in base 10 and I wish to convert them to base 2^x so I may store them easily in an array or vector of one of the standard C++ data types, likely unsigned int. I have only one idea for how to do base conversion, the repeated division with remainder method. Here is some C++ code describing that method:
vector<int> digits;
while(num!=0) {
int mod = num%base;
num = num/base;
digits.push_back(mod);
}
Conundrums
Some things I'm lost on are whether or not division with remainder is the proper way to do radix conversion on large integers. I've tried looking at how the GMP library does it. gmp/mpn/generic/set_str.c is the relevant c source file where "the magic" happens, but I am not certain what is going on in there. Matt McCutchen's BigInt appears to use the repeated division with remainder method. If I do use this method, I essentially need write two versions of my BigInt class, one for working in Base10 and the other for Base2^x.
Conclusion
- Offer advice on the proper steps for converting a huge number from a string to an array of 32 bit words.
- Help me learn how GMP converts a string to an array of 32bit words without wading through many layers of abstraction.
Examples Using 4 bit word size
Number we want to store (obviously on the small size): 123456789
unsigned chars have a range of 0-255, if we want to split up our number and store it in the vector we can do it one of three ways:
- As base 10, our vector looks like: [1,2,3,4,5,6,7,8,9]
- This is what my vector looked like in my first implementation.
- As base 100, our vector looks like: [1,23,45,67,89]
- Easy to convert from base 10 to base 100, has ciel(digits in base10/2) elements.
- As base 256, our vector looks like: [7,91,205,21]
Obviously the third solution is the most optimal for internal representation, and is what I am trying to get to.