Round a x float to y with a reduced mantissa (significant) floating point

Question

I'd like to understand how to round for example an IEEE float32 with 23 bit with a reduced mantissa one (i.e 20,19). Step by step solution in Python is desirable.

https://stackoverflow.com/questions/46093123/extracting-floating-point-significand-and-exponent-in-numpy could be a helpful starting point — meowgoesthedog, Jan 10 '19 at 16:40
You want something like this? 1.00 --> '1' 1.20 --> '1.2' 1.23 --> '1.23' 1.234 --> '1.23' 1.2345 --> '1.23' — Amit Amola, Jan 10 '19 at 16:40
Do you want an answer in pure Python or could the `numpy` module (a frequently-used third-party module that is used in the accepted answer in the link in the first comment) be used? — Rory Daulton, Jan 10 '19 at 17:13
Are you asking how, given the 24 bits of a significand to be reduced to 20 bits, one would decide whether to round up or down? Or how one could, given a floating-point value in Python, one can extract the significand and later construct a new floating-point value with a modified value? Or simply how to implement a function that rounds the significand to 20 bits? These are different questions. The first is about specifying the function. The second is about manipulating Python entities. The third can be done with Veltkamp-Dekker splitting, without manipulating the significand directly. — Eric Postpischil, Jan 10 '19 at 17:47

Rory Daulton · Answer 1 · 2019-01-10T23:39:52.803

Here is an answer using only basic Python and the standard math module.

from math import frexp, ldexp

def roundbits(fval, nbits):
    """Return the floating-point value `fval` rounded to `nbits` bits
    in the significand."""
    significand, exponent = frexp(fval)
    scale = 2.0 ** nbits
    newsignificand = round(significand * scale) / scale
    return ldexp(newsignificand, exponent)

Here is the explanation. To round a value to nbits bits in the significand of its floating-point representation, first get the bits of the significand, scaling them so the most significant bit comes just after the radix point and other bits follow. That is what the frexp function does. Next, multiply that significand by 2**nbits, which puts the number of bits that we want before the radix point and the unwanted bits after the radix point. (Both that scale value and the multiplication will be done exactly, for reasonable values of nbits.) Then we round that value to the nearest integer, which removes the unwanted bits. Then divide that value back by 2**nbits which shifts the most significant bit back to just after the radix point. Finally, use the new significand and the old exponent to make a floating-point number--which is what the ldexp function does.

This routine was kept simple so it ignores some edge cases. What if the number of desired bits is negative or zero or greater than Python can store in its floating-point values? Python usually uses the double type, which uses 64 bits for the entire floating-point number and 53 bits (52 explicitly stored) in the significand, but this is not guaranteed. I'll leave the error checking to you.

If you test this routine with

for n in range(1, 52):
    print(n, roundbits(pi, n))

which uses pi, which is in binary to 53 bits,

11.00100100001111110110101010001000100001011010001100001000110100

the result is

1 4.0
2 3.0
3 3.0
4 3.25
5 3.125
6 3.125
7 3.15625
8 3.140625
9 3.140625
10 3.140625
11 3.140625
12 3.1416015625
13 3.1416015625
14 3.1416015625
15 3.1416015625
16 3.1416015625
17 3.1416015625
18 3.1415863037109375
19 3.1415939331054688
20 3.1415939331054688
21 3.141592025756836
22 3.1415929794311523
23 3.141592502593994
24 3.1415927410125732
25 3.1415926218032837
26 3.1415926814079285
27 3.141592651605606
28 3.141592651605606
29 3.141592651605606
30 3.1415926553308964
31 3.1415926534682512
32 3.1415926534682512
33 3.1415926534682512
34 3.141592653701082
35 3.1415926535846666
36 3.1415926535846666
37 3.1415926535846666
38 3.1415926535846666
39 3.1415926535919425
40 3.1415926535883045
41 3.1415926535901235
42 3.1415926535901235
43 3.1415926535896688
44 3.141592653589896
45 3.1415926535897825
46 3.1415926535897825
47 3.1415926535897825
48 3.1415926535897967
49 3.1415926535897967
50 3.141592653589793
51 3.141592653589793

This is an interesting idea but I wonder if it really always does what the author wants i.e. is there a guarantee that a the `round` step will remove exactly all the unwanted bits (and for example will not leave some other garbage far at the end of mantissa)? — SergGr, Jan 11 '19 at 04:20
@SergGr: I am not sure this does exactly that the OP wants but this does satisfy the brief problem description. The `round` step is guaranteed to result in an integer so the unwanted bits will definitely be gone. I'm not sure the OP wants to round up to a higher order of magnitude (with a larger exponent), such as `pi` rounding up to `4` if only one bit is desired--but that is the most reasonable thing to do. — Rory Daulton, Jan 11 '19 at 10:43

Round a x float to y with a reduced mantissa (significant) floating point

1 Answers1

Linked