4

I'm trying to implement BigInt and have read some threads and articles regarding it, most of them suggested to use higher bases (256 or 232 or even 264).

Why higher bases are good for this purpose?

Other question I have is how am I supposed to convert a string into higher base (>16). I read there is no standard way, except for base64. And the last question, how do I use those higher bases. Some examples would be great.

phuclv
  • 37,963
  • 15
  • 156
  • 475
questions
  • 2,337
  • 4
  • 24
  • 39
  • 2
    Your two questions have nothing to do with each other; they would have been better off in separate threads. Also, your second question doesn't make sense. BASE64 is not a base. – Mr Lister Apr 16 '12 at 16:40
  • @MrLister: BASE64 encoding may not be a base, but it uses a radix-64 alphabet and could be used for true `b=64` numbers. – user7116 Apr 16 '12 at 19:44

2 Answers2

10

The CPU cycles spent multiplying or adding a number that fits in a register tend to be identical. So you will get the least number of iterations, and best performance, by using up the whole register. That is, on a 32-bit architecture, make your base unit 32 bits, and on a 64-bit architecture, make it 64 bits. Otherwise--say, if you only fill up 8 bits of your 32 bit register--you are wasting cycles.

StilesCrisis
  • 15,972
  • 4
  • 39
  • 62
  • You mean using a word(32bits or 64bits) as base? – questions Apr 16 '12 at 16:40
  • Yes, using the native register size for your architecture will be most efficient. Say your native registers are 64 bits; each 64-bit operation will take constant time. Making some of the bits zero (if you use only the low 8 bits, the top 36 are zero) doesn't change that constant time, but you now need to issue eight times more instructions. – Useless Apr 16 '12 at 16:43
  • You might want to use base 2^30 or 2^31 rather than 2^32, in order to implement "carrying" without having to use assembly language. – dan04 Apr 16 '12 at 16:44
  • 1
    Usually it is better to use half the bits than the word size (if you support multiplication), which ensures that each operation can be performed internally without having to check for potential overflows up front and rather normalize the representation after the operation has completed. – David Rodríguez - dribeas Apr 16 '12 at 16:46
  • 1
    @DavidRodríguez: I tend to lean towards always using 32 bits. This is because you can be sure of the presence of a 64-bit data type to store the result, and on sane archs, a 32x32 multiply results in a 64-bit value anyway. Similarly a 64x64 multiply on a 64-bit machine usually results in a 128-bit result, but since most C implementations do not have a 128-bit type to represent it in, there's no portable way to get at the upper bits. – R.. GitHub STOP HELPING ICE Apr 16 '12 at 16:57
  • @R..: That's a little presumptive. Most architectures have some way of calculating the high word of a 32x32 multiply, but I don't think it's common that you MUST calculate it. AFAIK that's an Intel thing. For instance, in PowerPC (think every console platform), calculating the high half of a 32x32 multiply requires a separate instruction. In fact, a good optimizing compiler will omit the low half of the multiply if you only use the high part. e.g. `uint32 a = (uint64(b) * uint64(c)) >> 32;` will generate the `mulhw` [mul high-word] but not the `mullw` [mul low-word]. – StilesCrisis Apr 16 '12 at 17:18
  • [most 64-bit compilers nowadays (except MSVC) have a 128-bit type](https://stackoverflow.com/q/3329541/995714) – phuclv Jan 14 '18 at 07:58
2
  1. first answer stated this best. I personally use base 2^16 to keep from overflowing in multiplication. This allows any two digits to be multiplied together once without ever overflowing.

  2. converting to a higher base requires a fast division method as well as packing the numbers as much as possible (assuming ur BigInt lib can handle multiple bases).

Consider base 10 -> base 2. The actual conversions would be 10 -> 10000 -> 32768 -> 2. This may seem slower, but converting from base 10 to 10000 is super fast. The amount of iterations for converting between 10000 and 32768 is very fast as there are very few digits to iterate over. Unpacking 32768 to 2 is also extremely fast.

So first pack the base to the largest base it can go to. To do this, just combine the digits together. This base should be <= 2^16 to prevent overflow.

Next, combine the digits together until they are >= the target base. From here, divide by the target base using a fast division algorithm that would normally overflow in any other scenario.

Some quick code

if (!packed) pack()

from = remake() //moves all of the digits on the current BigInt to a new one, O(1)
loop
    addNode()

    loop
        remainder = 0
        remainder = remainder*fromBase + from.digit
        enter code here`exitwhen remainder >= toBase
        set from = from.prev
        exitwhen from.head

    if (from.head) node.digit = remainder
    else node.digit = from.fastdiv(fromBase, toBase, remainder)

    exitwhen from.head

A look at fast division

loop
    digit = remainder/divide
    remainder = remainder%divide

    //gather digits to divide again
    loop
        this = prev
        if (head) return remainder

        remainder = remainder*base + digit
        exitwhen remainder >= divide

        digit = 0

return remainder

Finally, unpack if you should unpack

Packing is just combining the digits together.

Example of base 10 to base 10000

4th*10 + 3rd
*10 + 2nd
*10 + 1st
  1. ???

You should have a Base class that stores alphabet + size for toString. If the Base is invalid, then just display the digits in a comma separated list.

All of your methods should be using the BigInt's current base, not some constant.

That's all?

From there, you'll be able to do things like

BigInt i = BigInt.convertString("1234567", Base["0123456789"])

Where the [] is overloaded and will either create a new base or return the already created base.

You'll also be able to do things like

i.toString()
i.base = Base["0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"]
i.base = 256
i.base = 45000

etc ^_^.

Also, if you are using integers and you want to be able to return BigInt remainders, division can get a bit tricky =P, but it's still pretty easy ^_^.

This is a BigInt library I coded in vjass (yes, for Warcraft 3, lol, don't judge me)

Things like TriggerEvaluate(evalBase) are just to keep the threads from crashing (evil operation limit).

Good luck :D

phuclv
  • 37,963
  • 15
  • 156
  • 475
nestharus
  • 217
  • 2
  • 11
  • 2^16 is too preservative. Almost all 32-bit compilers have a 64-bit type so using 32-bit limb would not be a pain. In 64-bit compilers there are also a 128-bit type – phuclv Jan 14 '18 at 08:02