Where and how is the accuracy lost by doing in C integer multiplication and division and storing the result in integer or float?

Question

For a calculation in a C program, running on ESP32, I have to multiply and divide the following integers in the following way :

150 × 10000 ÷ 155 ÷ 138 × 100 ÷ 220 × 100 which produces 3100.000000 for a float variable and 3100 for a 32-bit unsigned integer.

I tried to test the result of the calculation on https://www.onlinegdb.com/ using the following code :

int main () {
    float calc = 150 * 10000 / 155 / 138 * 100 / 220 * 100 ;
    printf ( "calc = %f\n", calc ) ;    // 3100.000000

    uint32_t calc0 = 150 * 10000 / 155 / 138 * 100 / 220 * 100 ;
    printf ( "calc0 = %u\n", calc0 ) ;  // 3100
}

which again produces 3100.000000 for the float and 3100 for the 32-bit unsigned integer.

If I enter the same numbers in the calculator on my handy or laptop, the result is in both cases 3187,555782226.

So, I have an accuracy loss on the ESP32 of ( if I haven't messed up the formula ) ca. (3187−3100)÷3187×100 ~= 2,73 %

Where and how does the difference come from and is it possible to get the exact result on a 32-bit microcontroller as on the PC ?

This is likely due to the expression `150 * 10000 / 155 / 138 * 100 / 220 * 100` being calculated using integer arithmetic even when `calc` is a float, as it's only cast *after* the full calculation. Try adding a `.0` to all those numbers — Filipe Rodrigues, Jun 13 '23 at 21:14
the numbers are all stored in integer variables in the C program ; for the test program on https://www.onlinegdb.com/ I just used the respective values as they are used in the C program — Mario Christov, Jun 13 '23 at 21:17
Well, cast them to float if you want to retain the accuracy then. Otherwise you can also try to multiply first, then divide, which should result in more precision (provided you do not overflow your integers). — Filipe Rodrigues, Jun 13 '23 at 21:18
I tried to use uint64_t instead of uint32_t and thus avoid overflow when doing all the multiplications first, and then all the divisions, but it is exactly an integer overflow warning that I get, i.e. `uint64_t calc1 = 150 * 10000 * 100 * 100 / 220 / 155 / 138 ;` causes `warning: integer overflow in expression of type ‘int’ results in ‘2115098112’ [-Woverflow]` and results in 449 — Mario Christov, Jun 13 '23 at 21:26
_the numbers are all stored in integer variables in the C program_ There you have it. `150 * 10000 / 155` doesn't divide cleanly so integer division will chop off the decimal. Your problems snowball from there. As suggested, switch to floating point math if you want a more accurate result, but understand you'll rarely get the [_exact_ answer](https://stackoverflow.com/questions/588004/is-floating-point-math-broken) — yano, Jun 13 '23 at 21:31
About *"I have to multiply and divide the following integers in the following way"*, may I ask why? Where those "magic" numbers come from? Are you allowed to simplify the terms? — Bob__, Jun 13 '23 at 21:32
try [adding the `ULL` suffix](https://stackoverflow.com/questions/8809292/ull-suffix-on-a-numeric-literal) to each magic number to designate them as `unsigned long long`, should make that warning go away, but as suggested simplifying seems like the cleverer option. — yano, Jun 13 '23 at 21:36
@Bob__ "Where those "magic" numbers come from?" --> it's a formula for calculating an application rate for a sprayer :-) — Mario Christov, Jun 13 '23 at 21:41
Well, my point was if you were allowed to do something like `float calc = 150 * 21.25037f;` or even `uint32_t calc = 150 * 21250 / 1000;` — Bob__, Jun 13 '23 at 21:56
@Mario Christov, Curious, Why use `float calc` and not `double calc`? — chux - Reinstate Monica, Jun 14 '23 at 03:49
@chux-ReinstateMonica "ESP32 does not support hardware acceleration for double precision floating point arithmetic (double). Instead double is implemented via software hence the behavioral restrictions with regards to float do not apply to double. Note that due to the lack of hardware acceleration, double operations may consume significantly larger amount of CPU time in comparison to float." I read this under https://docs.espressif.com/projects/esp-idf/en/v4.2/esp32/api-guides/freertos-smp.html#floating-point-arithmetic — Mario Christov, Jun 14 '23 at 06:01
@MarioChristov Thanks for that info. Note that `printf ( "calc = %f\n", calc ) ;` converts the `float` to `double` before passing the `double` value to `printf()`, unless the compiler optimizes through that. — chux - Reinstate Monica, Jun 14 '23 at 07:30

Jan Schultke · Accepted Answer · 2023-06-14T08:34:05.723

8

You're not losing any precision on your calculator or mobile device. The precise result is 3187.5557822261889583067701... and your mobile device is approximating that quite accurately.

The problem is that in the expression 150 * 10000 / 155 / 138 * 100 / 220 * 100, all multiplications and divisions are between integers. Even if you use this expression to initialize a float, it's too late then; the precision is already lost.

To get a more precise result, make the first operand a float by adding a .f suffix, and all operations will be between floating point numbers then:

#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>

int main(void) {
    float calc = 150.f * 10000 / 155 / 138 * 100 / 220 * 100;
    printf( "calc = %f\n", calc );

    uint32_t calc0 = 150.f * 10000 / 155 / 138 * 100 / 220 * 100;
    // note: PRIu32 expands to the correct format specifier for uint32_t
    printf( "calc0 = %" PRIu32 "\n", calc0 );

    // tip: we can use unsigned long long (ull suffix) and shift all of the
    //      multiplications to the start to minimize precision loss
    //      (unsigned long long is needed to prevent overflow)
    uint32_t calc1 = 150ull * 10000 * 100 * 100 / 155 / 138 / 220; // 1)
    printf( "calc1 = %" PRIu32 "\n", calc1 );
}

Prints:

calc = 3187.555420
calc0 = 3187
calc1 = 3187

See live example

¹⁾ Instead of / 155 / 138 / 120, we can also write / (155ull * 138 * 120). The result is guaranteed to be the same.

edited Jun 14 '23 at 08:34

answered Jun 13 '23 at 21:26

Jan Schultke

17,446
6
47
96

I type-cast the 1st operand to (float) in the calculation and now I get the more precise value; thanks for the `ull` tip, wasn't aware of it :-) – Mario Christov Jun 13 '23 at 22:12
@Jan Schultke `"%u"` is best with `unsigned` and not `uint32_t`. Or use `"%" PRIu32` with `uint32_t`. – chux - Reinstate Monica Jun 14 '23 at 07:35
@ReinstateMonica thanks. I believe the `PRIu32` solution is better because on a 32-bit microcontroller, there is a very real chance that `unsigned int` is 16-bit. – Jan Schultke Jun 14 '23 at 08:14
The OP is apparently using a MCU without FPU so the question is not how to carry this out with floating point, but with fixed point. – Lundin Jun 14 '23 at 09:44
@Lundin actually the datasheet states "Xtensa® 32-bit LX7 dual-core processor" with "Single-precision floating-point unit (FPU) to accelerate computing", it's just that I am currently still trying to avoid the float intentionally as tasks which do not interleave when using only ints start to do it with floats, and there are some limitations with regard to float + FreeRTOS which I still haven't carefully examined and which are described here : https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-reference/system/freertos_idf.html#floating-point-usage – Mario Christov Jun 15 '23 at 08:44
1

@MarioChristov Be that as it may, there is no actual reason to use floating point in this specific case. Unless you need to use trigonometry and similar requiring math.h etc, it is unlikely that you actually need floating point. – Lundin Jun 15 '23 at 09:16
@Lundin here I am totally with you :-) – Mario Christov Jun 15 '23 at 09:21

paxdiablo · Answer 2 · 2023-06-13T22:07:07.487

If your expression has only int terms, it does int calculations, regardless of whether or not it gets assigned to a float afterwards⁽¹⁾. The assignment has no effect on the calculation itself.

For example, the sub-expression 150 * 10000 / 155 will give you the exact value 9677 rather than the more accurate value (around 9677.419).

You need to tell it to do the calculation as a floating point one, this can be done by simply making the first term 150.0 rather than 150.

Keep in mind this will actually do the calculations as double (higher range and precision than float). Then any precision-loss adjustments will be made when that's assigned to the float target. If that's a problem with your (possibly limited) platform, you can still use float in the calculation by using 150f⁽²⁾.

That's also why your "using uint64_t and doing multiplications first" (in one of your comments) didn't help here. That only affects the final type, the calculations are still done with the int type so any overflow would not be avoided. Again, you can fix that by using something like (uint64_t)150 rather than 150 in your expression.

⁽¹⁾ The general rule with an expression operand1 operation operand2 is that the two operands are first modified in such a way that they have the same "common real type".

So, for example, two int operands would stay int. For an int and long, the int would be converted to long. The result is then of that same type. This is covered in the ISO (C17) standard under 6.3.1 Arithmetic operands and 6.3.1.8 Usual arithmetic conversions. The former gives the ranks of the various integral types, the latter explains how the conversions are performed in detail.

Importantly here, a float and an int would have the latter "upgraded" to a float before doing the calculation.

⁽²⁾ There's a lot of discussion in various net locations about the floating point performance on the EPS32 so double operations may be problematic depending on your needs. But, as with all optimisation issues, you should measure, not guess. You can then intelligently make a cost/benefit decision to choose between speed and range/precision.

OP's question was specific to an ESP32. I would seriously advice against using double precision floating point numbers on 32-bit microcontrollers (due to obvious performance concerns). — Jan Schultke, Jun 13 '23 at 21:37
Good point, @JanSchultke, have added info to the answer to address this. — paxdiablo, Jun 13 '23 at 21:46
wow, thanks for the thorough explanations (especially regarding why an uint64_t variable did not help --> was still wondering what was going on there before your answer) and reference to exact chapters of the standard, very difficult to choose the answer :-( — Mario Christov, Jun 13 '23 at 22:26
That's okay, Mario, the rep is more impactful on Jan than me :-) — paxdiablo, Jun 14 '23 at 02:01

score 2 · Answer 3 · answered Jun 13 '23 at 21:27

Integer division truncates the value towards zero. In the case of 150 * 10000 / 155 the result is 9677.41935484 but becomes 9677, then is divided by 138 which should be 70.126227209 but becomes just 70. The subsequent multiplications amplify the magnitude of the error. In short, integer division costs precision. Possibly this could be mitigated by reordering the calculations, to do the divisions last and thus have fewer multiplications that 'magnify' the errors. Though do be aware of overflows.

Lundin · Answer 4 · 2023-06-15T05:50:10.597

In microcontrollers without a FPU you generally want to to stick to fixed point arithmetic and set the accuracy manually. Using float with no FPU present means horrible program performance, since the compiler will then handle floating point through software libs instead. Forget about that and stop taking bad advise from PC programmers.

Fixed point arithmetic isn't rocket science - you need to know the integer limits, elementary school math and a bit of common sense :) Basically it means: multiply first, then divide, because you only lose accuracy during division and integer division truncates all decimals.

In case of unsigned 32, the upper limit is 2^32 - 1 or if you will 4.29 billion. Your worst case calculation should not exceed this value at any point.

NOTE: all integer constants have a type and in case of constants like 150, it's int, which is signed-32 and not really a helpful type. We can either suffix all constants with u or maybe ul to ensure the correct type, or we can make sure that one of the operands is of the correct type, in which implicit promotion will promote the other operand.

If we want 3 decimals, then we'd simply multiply by 10^3 and then rearrange the equation so that we don't hit the upper limit anywhere:

uint32_t calc = 1000; // 3 decimals
    calc *= 150;
    calc *= 10000;
    calc /= 155;
    calc *= 100;
    calc /= 138;
    calc *= 100;
    calc /= 220;
printf("%lu.%lu", calc/1000, calc%1000);

Output: 3187.555.

Now of course you will lose a bit of accuracy during each division since the calculation is capped at 3 decimals in this example. And the end result is truncated, not rounded. If accuracy is important, then you could step up to 64 bit arithmetic. This is slower on a 32 bit MCU but it is still far faster than software floating point. And as it turns out, probably more accurate too:

uint64_t calc = 10000000000ull; // 10 decimals
    calc *= 150;
    calc *= 10000;
    calc /= 155;
    calc *= 100;
    calc /= 138;
    calc *= 100;
    calc /= 220;
printf("%llu.%llu", calc/10000000000ull, calc%10000000000ull);

Output: 3187.5557822261

Where and how is the accuracy lost by doing in C integer multiplication and division and storing the result in integer or float?

4 Answers4