Assuming the maximum possible value is less than 1E154, and thus all values fit in 512 bits, than I suspect the answer might be:
- Precalculate all possible
powers_of_10
in a static const array. (#ops ~0)
- If all numbers fit in 2 YMM registers, that means the max supported value is 1E154, which means the lookup table of powers of 10 would take ~9856 bytes.
- In your number, calculate the number of leading binary zeroes (Using
clz
as well) (#ops < ~10)
- Use
(max-bits - number_of_leading_zeroes) / 3.32192809489
to give a good estimate of the final number of decimal digits. This is also a good estimate a close power of 10. (#ops ~2)
- Iterate the
powers_of_10
from that estimate until you find the largest power-of-10 less than your value. (#ops ~8)
- If you're willing to sacrifice accuracy in the exponent, then this step can be skipped.
- Add that power of 10 to itself in a loop until greater than your input. (#ops < ~100) (10 additions of 10
uint64
s)
- If you're willing to sacrifice minor accuracy in the mantissa, then
double(input)/double(power_of_ten)
will do it in a single division.
- There's lots of shortcuts that you can take in the bigint->double conversion.
- Emit
loop_count-1
E
power_of_ten_index
. (#ops ~4)
If you're willing to sacrifice accuracy in both exponent and mantissa, that leaves ~16 operations that completely ignore the low bits.
Performance
Without writing out a final implementation, performance is hard to guess at, and with a large LUT, cache, and therefore the rest of the program becomes a factor, but here's preliminary data: https://quick-bench.com/q/53k-xSQz7y4iCO7ny66Dz2w62dQ (I ran it several times to try to eliminate outliers)
In my tests, the fastest combination seems to (unsurprisingly) be:
- Do not iterate the
powers_of_ten
to confirm the exponent.
- Do not exactly calculate the mantissa.
- Use floats to guess the mantissa (preferring speed to accuracy)
From this baseline, we can see:
- Iterating the
powers_of_ten
seems to have no noticeable impact on time on average, as long as you also use floats to guess the mantissa. If you do not use floats to guess the mantissa, then the mantissa calculation will take significantly longer. This implies it is not adding significant accuracy on average, and can be skipped to minimize code size.
- Using floats to guess the mantissa seems to make the algorithm ~5% faster on average, with no impact on accuracy.
- Iterating to find the exact mantissa slows the algorithm about 16%, but also implies that it's adding accuracy, so I assume you want to keep that.
- I had two variants of my bigint->double conversion code when estimating the mantissa. One version only ensured the double had at least 1 significant bit minimum, and the other version ensured that the double had >4 significant bits. The additional code to ensure more significant bits had no noticeable impact on time on average, so the additional code/accuracy at that step did not result in noticeably more performance in finding the exact mantissa, and I assume you want to skip that, to minimize code size.
It is also interesting to note that all of these are ~4x faster than converting the bigint to a double
and then using printf("%.0E",value);
(I do not vouch for the accuracy of the results of any of this code)