3

I have looked for a while to find an algorithm which converts integers to string. My requirement is to do this manually as I am using my own large number type. I have + - * /(with remainder) defined, but need to find a way to print a single number from a double int (high and low, if int is 64bits, 128bits total).

I have seen some answers such as

Convert integer to string without access to libraries

Converting a big integer to decimal string

but was wondering if a faster algorithm was possible. I am open to working with bits directly(e.g. base2 to base10-string - I could not find such an algorithm however), but I was just hoping to avoid repeated division by 10 for numbers possibly as large as 2^128.

Grzegorz
  • 107
  • 8
  • I don't think there is a way, since at least you have to traverse all digits, which division by 10 does – ZhaoGang Dec 14 '18 at 10:29
  • 1
    It seems pretty difficult since 10 doesn't have a friendly representation in binary. I don't see how you're going to avoid divisions, but I'm not an expert in binary arithmetic at all so... – Dici Dec 14 '18 at 11:53
  • That's what I feared... Especially what you say @Dici, since base10 is not in the line of powers of 2, it can't easily be translated... Oh well – Grzegorz Dec 14 '18 at 12:18
  • 1
    As it's just 128 bits and not thousands of bits, most probably it's not worth searching for the most elaborated algorithm. When your overall application is done, check if you have a performance problem, and if yes, use a profiler to find its source. If that really happens to be the conversion to string, then try some other algorithms. – Ralf Kleberhoff Dec 14 '18 at 13:48
  • I recently came up with an algorithm that works especially well with large integers, because it reduces the problem recursively: https://stackoverflow.com/a/53644080/5987 – Mark Ransom Dec 14 '18 at 14:35

3 Answers3

3

You can use divide-and-conquer in such a way that the parts can be converted to string using your standard library (which will typically be quite efficient at that job).

So instead of dividing by 10 in every iteration, you can e.g. divide by 10**15, and have your library convert the chunks to 15-digit strings. After at most three steps, you're finished.

Of course you have to do some string manipulation regarding the zero-padding. But maybe your library can help you here as well, if you use something like a %015d zero-padding format for all the lower parts, and for the highest non-zero part use a non-padding %d format.

Ralf Kleberhoff
  • 6,990
  • 1
  • 13
  • 7
  • 1
    A 128-bit number can be up to 39 decimal digits, so you'd only need a maximum of 3 long divisions to produce 13-digit intermediate results. You could go up to 18 digits while staying below 64 bits, which might allow an optimization when the number is less than 10**36. – Mark Ransom Dec 14 '18 at 15:54
  • 1
    @MarkRansom You're right, you can go up to 18 digits, thus increasing the chance to finish with one or two instead of three iterations, even 19 digits if you can use an unsigned representation. – Ralf Kleberhoff Dec 14 '18 at 16:49
  • Sorry, I misspoke - you only need *two* division+remainder operations to get all 3 parts. With threshold testing you can do it with 0, 1, or 2 depending on the magnitude of the number. – Mark Ransom Dec 14 '18 at 17:12
0

You may try your luck with a contrived method, as follows.

Numbers can be represented using the Binary-Coded Decimal representation. In this representation, every decimal digit is stored on 4 bits, and when performing an addition, if the sum of two digits exceeds 9, you add 6 and carry to the left.

If you have pre-stored the BCD representation of all powers of 2, then it takes at most 128 additions to perform the conversion. You can spare a little by the fact that for low powers, you don't need full length addition (39 digits).

But this sounds as a lot of operations. You can "parallelize" them by packing several BCD digits in an single integer: an integer addition on 32 bits is equivalent to 8 simultaneaous BCD digit additions. But we have a problem with the carries. To work around, we can store the digits on 5 bits instead of 4, and the carries will appear in the fifth bit. Then we can obtain the carries by masking, add them to the next digits (shift left 5), and adjust the digit sums (multiply by 10 and subtract).

  2 3 4 5 6
+ 7 6 9 2 1
= 9 913 7 7

Carries:

 0-0-1-0-0

Adjustments:

  9 913 7 7
-0000010000
= 9 9 3 7 7

Actually, you have to handle possible cascaded carries, so the sum will involve the two addends and carries in, and generate a sum and carries out.

32 bits operations allow you to process 6 digits at a time (7 rounds for 39 digits), and 64 bits operations, 12 digits (4 rounds for 39 digits).

0
  1. if you want to just encode your numbers as string

    use hex numbers that is fast as you can cast all the digits just by bit operations ... also using Base64 encoding is doable just by bit operations + the translation table. Booth representations can be done on small int arithmetics only in O(n) where n is the count of printed digits.

  2. If you need base10

    then print a hex string and convert it to decimal on strings like this:

    this is much slower than #1 but still doable on small int arithmetics ... You can do this also in reverse (input number from string) by using dec2hex ...

For bigint libs there are also another ways of easing up the string/integer conversions:

  1. BCD

    binary coded decimal ... the number printed as hex is the decadic number. So each digit has 4 bits. This waste some memory but many CPU's has BCD support and can do operations on such integers natively.

  2. Base 10^n

    sometimes is used base 10^n instead of 2^m while

    10^n <= 2^m
    

    The m is bitwidth of your atomic integer and n i snumber of decadic digits that fits inside it.

    for example if your atomic unsigned integer is 16 bit it can hold up to 65536 values in base 2. If you use base 10000 instead you can print each atom asa decadic number with zeropad from left and simply stack all such prints together.

    This also waste some memory but usually not too much (if the bitwidth is reasonably selected) and you can use standard instructions on the integers. Only the Carry propagation will change a bit...

    for example for 32bit words:

    2^32 = 4294967296 >= 1000000000
    

    so we wasted log2(4.2949...) = ~2.1 bits per each 32 bits. This is much better than BCD log2(16/10)*(32/4)= ~5.42 bits And usually even better with higher bit widths

Spektre
  • 49,595
  • 11
  • 110
  • 380