How are large and small Floating point values represented with Fixed point

Question

I know that on machines without Floating Point Units, such numbers should either be represented in fixed point or FPU is emulated using libc.

In the former case, how are very large and very small FPU values repined in fixed point? my understanding is that fixed point has a limited number of decimal resolution explained here: Is fixed point math faster than floating point?

So if I try to compile a very small/large FP value in C, for a machine without an FPU, how would that number get repressed in fixed point?

Floating point is always floating point in C. It's not fixed point. — user253751, Jun 30 '20 at 16:07
If a machine doesn't have an FPU, floating point calculations are done in software. — Barmar, Jun 30 '20 at 16:13
How you choose to represent numbers in fixed point is up to the programmer, and varies by application. Money, for example, is often represented as integer tenths of a cent. An astronomical database might use integer gigameters or something. It depends entirely on the application. — Lee Daniel Crocker, Jun 30 '20 at 16:21
64-bit fixed point number has a decimal resolution of 1/(2^32), floating point number has a decimal resolution of up to 1/(2^53) so how do you represent that big of a FP value with fixed pint when it doesn't even seem possible? — Dan, Jun 30 '20 at 16:23
By using an array of integers. That is how a bignum library works. So if you use an array of eight 32-bit integers, you can have a 256-bit number, and can use a 64-bit variable for partial products etc. In a similar way that you can write a decimal number on paper, as long as the size of paper allows, and do 'longhand' arithmetic operations on it. — Weather Vane, Jun 30 '20 at 16:38
@Dan: There is no single definition of a 64-bit fixed-point number. A 64-bit fixed-point number is simply a 64-bit integer (signed or unsigned, as desired) multiplied by a fixed scale. That scale may be binary, decimal, or anything else; it is not required to be 1/2^32. There is also no single definition of a floating-point number. The IEEE-754 binary64 format has 53-bit significands, but they do not have a “decimal resolution of up to 1/(2^53).” — Eric Postpischil, Jun 30 '20 at 20:37

the busybee · Answer 1 · 2020-07-04T12:46:53.073

1

In principle it does not matter whether you have a common floating point number (for example IEEE-754 float or double) or a fixed point number. Both have their limits towards very large (absolute) values and towards very small values.

A very small number (less than the smallest fixed point value not equal to zero, or the half of it, depending on rounding) will be represented as zero. Approaching the lower limit will raise the inaccuracy, because the number of available digits gets smaller.
A very large number (more than the absolut value of the largest fixed point, and a bit, depending on rounding) can not be represented.

Examples, for convenience on a decimal base, commonly numbers are on a binary base:

Let's assume you have a signed fixed point defined with 3 digits left of the decimal point and 2 digits right of it.

The smallest difference between one value and another is 0.01.
The smallest values not equal to zero are -000.01 and +000.01.
The largest values are -999.99 and +999.99.
Rounding is presumed.

A value like PI will be represented as 3.14, giving an inaccuracy of about 0.05%.

If you try to assign an absolute value smaller than 000.005 it will be represented as 0.

If you try to assign a small value like 0.12345, it will be represented as 0.12, giving an inaccuracy of just about 3%.

If you try to assign a value larger than or equal to 999.995, you can not do this. If your definition knows the concept of overflow, this will be the result.

edited Jul 04 '20 at 12:46

answered Jun 30 '20 at 16:45

the busybee

10,755
3
13
30

I see. let's say I have a `32 bit int` and a `64 bit double fraction`. I know on microcontrollers with 8 bit registers, a big `int` is divided into chunks. Does the same happen to a `64 bit fraction` being represented with fixed point? on chips with no FPU? – Dan Jun 30 '20 at 16:50
@Dan Why would you use a `double` as fraction, wasting the bits of its exponent? -- Sure, on an 8-bit system, all objects larger than 8 bits need to be divided into 8-bit chunks. Or did I not get your point? – the busybee Jul 02 '20 at 17:04
That's not the accuracy, that's the precision loss, the opposite. Also, 999.995 may well be truncated to 999.99 depending on the rounding strategy. – John McFarlane Jul 03 '20 at 07:31
@JohnMcFarlane Sorry, English is not my native language. Accuracy and precision are both translations of what I mean: the relative difference between the exact value and its representation. – the busybee Jul 03 '20 at 08:57
@thebusybee I agree that accuracy and precision are interchangeable here. But your numbers describe inaccuracy, not accuracy. They don't describe precision; they describe the loss of precision. I could agree with your answer if it wasn't saying the opposite of what it meant to say. – John McFarlane Jul 03 '20 at 21:20

chux - Reinstate Monica · Answer 2 · 2020-06-30T16:36:33.580

How are large and small Floating point values represented with Fixed point
how would that number get repressed in fixed point?

Fixed point representation, unlike floating point, is not defined by C nor in the Standard C library.

Fixed point range and precision details are up to the programmer and perhaps by a selected auxiliary fixed-point library.

If an implementation has a floating point unit has small to no impact on floating point encoding and functionality. Of course performance is affected.

An FPU may or may not influence fixed point selection.

How are large and small Floating point values represented with Fixed point

2 Answers2

Examples, for convenience on a decimal base, commonly numbers are on a binary base: