Trying to recreate printf's behaviour with doubles and given precisions (rounding) and have a question about handling big numbers

Question

I'm trying to recreate printf and I'm currently trying to find a way to handle the conversion specifiers that deal with floats. More specifically: I'm trying to round doubles at a specific decimal place. Now I have the following code:

double  ft_round(double value, int precision)
{
    long long int power;
    long long int result;

    power = ft_power(10, precision);
    result = (long long int) (value * power);
    return ((double)result / power);
}

Which works for relatively small numbers (I haven't quite figured out whether printf compensates for truncation and rounding errors caused by it but that's another story). However, if I try a large number like

-154584942443242549.213565124235

I get -922337203685.4775391 as output, whereas printf itself gives me -154584942443242560.0000000 (precision for both outputs is 7).

Both aren't exactly the output I was expecting but I'm wondering if you can help me figure out how I can make my idea for rounding work with larger numbers.

My question is basically twofold:

What exactly is happening in this case, both with my code and printf itself, that causes this output? (I'm pretty new to programming, sorry if it's a dumb question)
Do you guys have any tips on how to make my code capable of handling these bigger numbers?

P.S. I know there are libraries and such to do the rounding but I'm looking for a reinventing-the-wheel type of answer here, just FYI!

The rounding you see in the `printf` output probably happens when you assign the literal value to the variable because a double variable has a limited precision for the mantissa part. — Bodo, Jul 04 '19 at 15:02
"Both aren't exactly the output I was expecting" Are you expecting more than 19 mathematically correct digits in the output using `double`? If you want `-154584942443242549.213565124235` than you effectively want to re-create those "there are libraries". Not a trivial task. — chux - Reinstate Monica, Jul 04 '19 at 15:07
Your (nearest) comparisons diverge at the 17th decimal digit which is about right. You can't get more precision than the type will hold in 53 bits of significand. — Weather Vane, Jul 04 '19 at 15:31
Thanks everyone for the responses! I'm going to try some of these suggestions and see if I can get a better approximation. I'm not planning on recreating a huge library just for this so I guess I have to accept some limitations but I'm just trying to stretch them a bit! Thanks for helping me understand some of the possible bottlenecks and solutions. — Laure Ravier, Jul 04 '19 at 15:57
@AnttiHaapala: It's not broken, just not magic. It can't do things that are fundmentally mathematically impossible (storing values with thousands of bits of precision in 64 bits). — R.. GitHub STOP HELPING ICE, Jul 04 '19 at 16:41

score 2 · Accepted Answer · answered Jul 04 '19 at 16:22

2

You can't round to a particular decimal precision with binary floating point arithmetic. It's not just possible. At small magnitudes, the errors are small enough that you can still get the right answer, but in general it doesn't work.

The only way to round a floating point number as decimal is to do all the arithmetic in decimal. Basically you start with the mantissa, converting it to decimal like an integer, then scale it by powers of 2 (the exponent) using decimal arithmetic. The amount of (decimal) precision you need to keep at each step is roughly (just a bit over) the final decimal precision you want. If you want an exact result, though, it's on the order of the base-2 exponent range (i.e. very large).

Typically rather than using base 10, implementations will use a base that's some large power of 10, since it's equivalent to work with but much faster. 1000000000 is a nice base because it fits in 32 bits and lets you treat your decimal representation as an array of 32-bit ints (comparable to how BCD lets you treat decimal representations as arrays of 4-bit nibbles).

My implementation in musl is dense but demonstrates this approach near-optimally and may be informative.

answered Jul 04 '19 at 16:22

R.. GitHub STOP HELPING ICE

208,859
35
376
711

Thank you for this, I was looking for a better approach. I'll definitely check your implementation out! – Laure Ravier Jul 04 '19 at 16:45
Hi, thanks again for your answer! I was hoping you could clear something up for me about the size of your big array: how did you come to the calculation I've pasted below for the size needed? uint32_t big[(LDBL_MANT_DIG+28)/29 + 1 + (LDBL_MAX_EXP+LDBL_MANT_DIG+28+8)/9]. On my system, I have extended precision long doubles so I have to rewrite this but I'm not really following what's going on here. Any explanation would be very welcome! – Laure Ravier Jul 12 '19 at 16:40
@LaureRavier: The units of the `big` array are base-10^9 "digits" (typically called "limbs" in arbitrary-precision terminology). Each one holds at least 29 (almost 30) bits of the mantissa during initial mantissa expansion, then each halving (for a negative exponent) adds one decimal digit to the end (so each 9 halvings add a base-10^9 digit). Roughly every 3 also remove a decimal digit at the front, but that's ignored. In the other direction, doubling, it grows more slowly, roughly once every 3 doublings, but the end never shrinks. – R.. GitHub STOP HELPING ICE Jul 12 '19 at 17:32
I'm not sure why you think you need to edit it for "extended precision", though. Just plugging in the right values for your type's mantissa and exponent range should work. Of course if the exponent range is huge you won't necessarily be able to fit the exact converted value in memory... – R.. GitHub STOP HELPING ICE Jul 12 '19 at 17:33

chux - Reinstate Monica · Answer 2 · 2019-07-09T09:40:46.393

What exactly is happening in this case, both with my code and printf itself, that causes this output?

Overflow. Either ft_power(10, precision) exceeds LLONG_MAX and/or value * power > LLONG_MAX.

Do you guys have any tips on how to make my code capable of handling these bigger numbers?

Set aside various int types to do rounding/truncation. Use FP routines like round(), nearby(), etc.

double ft_round(double value, int precision) {
    // Use a re-coded `ft_power()` that computes/returns `double`
    double pwr = ft_power(10, precision);
    return round(value * pwr)/pwr;
}

As well mentioned in this answer, floating point numbers have binary characteristics as well as finite precision. Using only double will extend the range of acceptable behavior. With extreme precision, the value computed with this code be close yet potentially only near the desired result.

Using temporary wider math will extend the acceptable range.

double ft_round(double value, int precision) {
    double pwr = ft_power(10, precision);
    return (double) (roundl((long double) value * pwr)/pwr);
}

I haven't quite figured out whether printf compensates for truncation and rounding errors caused by it but that's another story

See Printf width specifier to maintain precision of floating-point value to print FP with enough precision.

Trying to recreate printf's behaviour with doubles and given precisions (rounding) and have a question about handling big numbers

2 Answers2