Algorithm for converting large hex numbers into decimal form (base 10 form)

Question

I have an array of bytes and length of that array. The goal is to output the string containing that number represented as base-10 number.

My array is little endian. It means that the first (arr[0]) byte is the least significant byte. This is an example:

#include <iostream>
using namespace std;

typedef unsigned char Byte;

int main(){
  int len = 5;
  Byte *arr = new Byte[5];
  int i = 0;

  arr[i++] = 0x12;
  arr[i++] = 0x34;
  arr[i++] = 0x56;
  arr[i++] = 0x78;
  arr[i++] = 0x9A;

  cout << hexToDec(arr, len) << endl;
}

The array consists of [0x12, 0x34, 0x56, 0x78, 0x9A]. The function hexToDec which I want to implement should return 663443878930 which is that number in decimal.

But, the problem is because my machine is 32-bit so it instead outputs 2018915346 (notice that this number is obtained from integer overflow). So, the problem is because I am using naive way (iterating over the array and multiplying it by 256 to the power of position in the array, then multiplying by the byte at that position and finally adding to the sum). This of course yields integer overflow.

I also tried with long long int, but at some point of course, integer overflow occurs.

The arrays I want to represent as decimal number can be very long (more that 1000 bytes) which definitelly requires a lot more clever algorithm than my naive one.

Question

What would be the good algorithm to achieve that? Also, another question I must ask is what is the optimal complexity of that algorithm? Can it be done in linear complexity O(n) where n is the length of the array? I really cannot think about a good idea. Implementation is not the problem, my lack of ideas is.

Advice or idea how to do that will be enough. But, if it is easier to explain using some code, feel free to write in C++.

You might find help in using a bigint lib/class, use your favorite search engine. — Yunnosch, Jul 23 '17 at 06:19
@Yunnosch. My goal is to implement it from scratch, not to use builtin libraries or classes. C++ is just an example language, I'm interested in algorithm. — , Jul 23 '17 at 06:21
Tough, I hope it is for learning, not for solving a bigger problem. I think "divide like taught with pencil and paper in school" would be "an" algorithm. Probably not the best one ... I.e. based on the knowledge of the size of the array (i.e. power of 16), find the lowest power of 10 which is higher than the value you are looking for, then get the remainder of dividing by the next lower power of ten. Repeat. — Yunnosch, Jul 23 '17 at 06:29
@Yunnosch. *"Find the lowest power of 10 which is higher than the value you are looking for*" - wouldn't that cause an integer overflow too? — , Jul 23 '17 at 06:33
Not if you use the exponent. You need to stay within the datatype array of byte for everything else. I.e. use a lookup table of arrays of bytes for the values (or generate them on the fly once you have decided on the exponent). — Yunnosch, Jul 23 '17 at 06:34
Would you accept an (otherwise hopefully satisfatory) answer in pseudo code? Describing in comment-formatting is probably going to be cumbersome... — Yunnosch, Jul 23 '17 at 06:38
@Yunnosch. Well considering that I'm still trying to understand your *"Divide with pencil and paper"* approach, it would be great if you provide a pseudo code. Also, what is the complexity of your method? — , Jul 23 '17 at 06:41
Sorry, I suck at correctly juding complexities. Also, it would probably better match StackOverflow concepts to make that a separate follow-up question. And I definitly do not say it is the best/fast algorithm. For that I refer to a bigint implementation. You could of course get one and look at the source code, assuming that some are available in source. — Yunnosch, Jul 23 '17 at 06:43
Damn, even making psudocode for this is more than I thought. You need to split into smaller problems: a) making an arrayOfHex representation of a high power of 10, b) multipling that by a "digit" (base 10 or better base 16) c) subtracting one arrayOfHex from another. A trick to simplify that is the fact that the size of your array is a good estimate on the value, by giving you the lowest power of 16 which is higher than the value. Another is that 10 to the power of (N * log(16)/log(10)) is roughly equal to 16 to the power of N. Good luck. — Yunnosch, Jul 23 '17 at 07:15
My respect to anybody or any team to ever achieve a working bigint implementation. — Yunnosch, Jul 23 '17 at 07:19
@Yunnosch Implementing small bigints is easy the problems got started once the botwidth crosses `16000` bits especially for stuff like `mul,div,pow,log,exp`. For printing in decadic you need to continuously divide and mod by 10 ... obtaining digits from LSD to MSD until the subresult is zero. Then just print them in reverse order ... The link in my Answer does it without big int math ... on its own arrays — Spektre, Jul 23 '17 at 07:43
The question on complexity is answered here: https://stackoverflow.com/q/28418332/509868; a quadratic algorithm might be easier to implement than the fastest one. — anatolyg, Jul 23 '17 at 07:54

Spektre · Accepted Answer · 2017-07-23T12:27:25.980

You can and can not achieve this in O(n). All depends on the internal representation of your number.

For truly binary form (power of 2 base like 256)

is this not solvable in O(n) The hex print of such number is in O(n) however and you can convert HEX string to decadic and back like this:
- How to convert a gi-normous integer (in string format) to hex format?
As creating hex string does not require bignum math. You just consequently print the array from MSW to LSW in HEX. This is O(n) but the conversion to DEC is not.

To print bigint in decadic you need to continuously mod/div it by 10 obtaining digits from LSD to MSD until the subresult is zero. Then just print them in reverse order ... The division and modulus can be done at once as they are the same operation. So if your number has N decadic digits then you need N bigint divisions. Each bigint division can be done for example by binary division so we need log2(n) bit shifts and substraction which are all O(n) so the complexity of native bigint print is:
```
O(N.n.log2(n))
```
We can compute N from n by logarithms so for BYTEs:
```
N = log10(base^n)
  = log10(2^(8.n))
  = log2(2^(8.n))/log2(10)
  = 8.n/log2(10)
  = 8.n*0.30102999
  = 2.40824.n
```
So the complexity will be:
```
O(2.40824.n.n.log2(n)) = O(n^2.log2(n))
```
Which is insaine for really big numbers.
power of 10 base binary form

To do this in O(n) you need to slightly change the base of your number. it will still be represented in binary form but the base will be power of 10.

For example if your number will be represented by 16bit WORDs you can use highest base 10000 which still fits in it (max is 16536). Now you print in decadic easily just print consequently each word in you array from MSW to LSW.

Example:

lets have big number 1234567890 stored as BYTEs with base 100 where MSW goes first. So the number will be stored as follows
```
BYTE x[] = { 12, 34, 56, 78, 90 }
```
But as you can see while using BYTEs and base 100 we are wasting space as only 100*100/256=~39% is used from the full BYTE range. The operations on such numbers are slightly different then in raw binary form as we need to handle overflow/underflow and carry flag differently.
BCD (binary coded decimal)

There is also another option which is to use BCD (binary coded decimal) it is almost the same as previous option but the base 10 is used for single digit of number... each nibel (4 bits) contains exactly one digit. Processors usually have instruction set for this number representation. The usage is like for binary encoded numbers but after each arithmetics operation is BCD recovery instruction called like DAA which uses Carry and Auxiliary Carry flags state to recover BCD encoding of the result. To print value in BCD in decadic you just print the value as HEX. Our number from previous example would be encoded in BCD like this:
```
BYTE x[] = { 0x12, 0x34, 0x56, 0x78, 0x90 }
```

Off course both #2,#3 will make impossible the HEX print of your number in O(n).

Your script from the link you provided outputs `098083` for hex number `9A78563412`. — , Jul 23 '17 at 08:02
@M__6littleeggs didi you use `str_hex2dec("9A78563412h");` ? The script is tested. Are you running it in Borland? if not maybe you miss port the `AnsiString` it start indexing from `1` not from `0`. I am going to check it now for your number ... — Spektre, Jul 23 '17 at 08:05
It seems you're right. The problem was because I just replaced `AnsiString` with `std::string` and `Length` with `length`, I didn't know indexes are shifted to `1`. It works and btw it is very good explained. Your code is easy to udnerstand. — , Jul 23 '17 at 08:15
@M__6littleeggs take a look at links in here [Arithmetic with base 256 number system](https://stackoverflow.com/a/45233444/2521214) as a next step for your bignums. It is a collection of few SO QAs about bignum algorithms of mine (not all of them just few) — Spektre, Jul 23 '17 at 08:24
@greybeard :) repaired (you could edited it on your own ... ) — Spektre, Jul 23 '17 at 12:28
Weird. I'm wondering why this answer has 0 upvotes. This is **1)** Correct answer, **2)** well explained and the code looks clear and intuitive, **3)** your solution exactly answers OP's question. Therefore, the lack of upvotes is something really confuses me. — , Nov 16 '17 at 20:23
@ThePirateBay my bet is thats mostly because this was too quickly accepted as answer which means that a lot of users will skip to read this leading to much low views hence a lot less votes. — Spektre, Nov 16 '17 at 21:01

score 0 · Answer 2 · answered Jul 26 '17 at 08:04

The number you posted 0x9a78563412, as you have represented it in little endian format, can be converted to a proper uint64_t with the following code:

#include <iostream>
#include <stdint.h>

int main()
{
    uint64_t my_number = 0;
    const int base = 0x100; /* base 256 */
    uint8_t array[] = { 0x12, 0x34, 0x56, 0x78, 0x9a };

    /* go from right to left, as it is little endian */
    for (int i = sizeof array; i > 0;) {
        my_number *= base;
        my_number += array[--i];
    }
    std::cout << my_number << std::endl; /* conversion uses 10 base by default */
}

sample run gives:

$ num
663443878930

as we are in a base that is an exact power of 2, we can optimize the code by using

my_number <<= 8; /* left shift by 8 */
my_number |= array[--i]; /* bit or */

as these operations are simpler than integer multiplication and sum, it is expected some (but not much) efficiency improvement in doing that way. It should be more expressive to leave it as in the first example, as it more represents an arbitrary base conversion.

score -1 · Answer 3 · answered Jul 23 '17 at 08:00

You'll need to brush up your elementary school skills and implement long division.

I think you'd be better off implementing the long division in base 16 (divide the number by 0x0A each iteration). Take the reminder of each division - these are your decimal digits (first one is the least significant digit).

Algorithm for converting large hex numbers into decimal form (base 10 form)

3 Answers3

Linked