Floating point operations with no library

Question

I am looking for a efficient way to properly do mathematical operations with floating values. As I am in the embedded C, I don't want to use any extra library for float data type.

As far as I understand, the correct way here would be to treat a floating value as a raw binary(sign, exponent, mantissa), and do the operations like that. But I cannot find any examples on how exactly that works.

I am looking for a explication on how to do the following with no float data type: Given a variable int x that can have values from 0 to 10000.

y = x * 0.720 + 84.234;
y = y / 2.5;

Thank you for your time internet

You could look at [fixed point arithmetic](https://stackoverflow.com/questions/10067510/fixed-point-arithmetic-in-c-programming). Floating point itself isn't simple to implement. — Weather Vane, Apr 18 '22 at 09:54
"_As I am in the embedded C, I don't want to use any extra library for float data type." Is _non-sequitur_. There is likely no reason or advantage in not using a library if you are having to write the code in any event. Why do you believe you need to do that? Moreover `float` is a built-in data type; basic operations do not need a library. There are valid reasons for not using floating point in embedded systems, but they may not apply in your case, and simply implementing your own will offer no advantage. — Clifford, Apr 18 '22 at 21:48
This question will surely be closed. A better question would be to provide some code where you think floating point is required and ask how to eradicate it. Discussed recently here https://stackoverflow.com/questions/71725823/c-representing-a-fraction-without-floating-points/71727675 for example. — Clifford, Apr 18 '22 at 21:53

Clifford · Answer 1 · 2022-04-20T08:37:11.573

Floating point libraries are not required for the example operations you have suggested, and while avoiding floating point code on an embedded system without an FPU is often advisable, doing that by implementing your own floating point encoding will save you nothing and will likely be less efficient, less comprehensible and more error prone than using compiler's built-in FP support.

Instead, you need to avoid floating-point code entirely, and use fixed-point encoding. In many cases that can be done ad-hoc for individual expressions, but if your application is math intensive (involving trig, logs, sqrt, exponentiation for example) you might to choose a fixed-point library or implement your own.

Floating-point dependency is trivially eradicated in the examples you have suggested; for example:

// y = x * 0.720 + 84.234
// Where x_x1000 = real value * 1000
int y_x1000 = (x_x1000 * 720) / 1000 + 84234 ;

or more efficiently using binary-fixed-point and a 10 bit fractional part:

// y = x * 0.720 + 84.234
// Where x_q10 = real value * 1024
int32_t y_q10 = (x_q10 * 737) >> 10 + 86256 ;

Although you might consider int64_t for greater numeric range - in which case you might also use more fractional bits for greater precision too.

If you are doing a lot of intensive fixed-point maths, you would do well to consider a library or implement one using CORDIC algorithms. An example of such a library can be found at https://www.justsoftwaresolutions.co.uk/news/optimizing-applications-with-fixed-point-arithmetic.html, although it is C++ - the clear advantage being that by defining a fixed class and extensive operator overloading, existing floating-point code can largely be converted to fixed point by replacing double or float keywords with fixed and compiling as C++ - even if the code is otherwise non-OOP and entirely C-like.

Note if you do use the code from https://www.justsoftwaresolutions.co.uk/news/optimizing-applications-with-fixed-point-arithmetic.html, be sure to apply the correction described at https://stackoverflow.com/questions/7062046/how-do-i-calculate-distance-between-gps-co-ordinates-on-a-processor-with-poor-flo/7126717#7126717 — Clifford, Apr 18 '22 at 22:32

Floating point operations with no library

1 Answers1