3

pandas.DataFrame.round() gives somewhat unpredictable output on the below data frame. Can anyone please help me understand what is happening here?

df = pd.DataFrame([(61.21, 69.32), (65.1938, .67)], columns=['A', 'B'], dtype='float32')
df.round(2)

The above code outputs:

           A      B
0  61.209999  69.32
1  65.190002   0.67

Why does round(2) not truncate all but first 2 digits after decimal point? When I try the same with dtype='float64' it works and outputs below:

df.astype('float64').round(2) 
       A      B
0  61.21  69.32
1  65.19   0.67

Is there something about the values in the data frame?

I don't want to format the output as string (round(2).applymap('{:.2f}'.format)) to get the desired number of decimals. I want to know why the columns are giving different number of digits after the decimal point.

I am using pandas version '0.24.1'.

rahul891
  • 79
  • 2
  • 6
  • 1
    The `float64` values also differ from the decimal results you see and are expecting (but should not). Instead of 65.19, the actual value is 65.18999999999999772626324556767940521240234375, but the software is not showing you enough digits to see that. It is **impossible** to round binary floating-point numbers to two decimal places (except for .00, .25, .50, and .75) because **binary** floating-point numbers do not have **decimal** places. The values 65.19 and 61.21 are simply not representable in binary floating-point. – Eric Postpischil Feb 07 '20 at 14:35
  • In the `float64` format, the differences are small enough that you cannot see them in the digits the software shows you. In the `float32` format, they are (or at least some of them are) large enough that you see them. – Eric Postpischil Feb 07 '20 at 14:37
  • @EricPostpischil so x = 1.5 will store the nearest floating point value to 1.5 but never exactly 1.5 ? In my example above 65.1938 is rounded to 65.190002 in the 32-bit floating point scheme because 65.190002 is the closest 32-bit floating point number to 65.19 ? – rahul891 Feb 07 '20 at 20:39
  • 1
    1.5 is representable in floating-point; it will be stored exactly. The nearest value to 65.19 in IEEE-754 binary32 is 65.19000244140625. Some software displays it as “69.190002”, but it is not 69.190002; it is 65.19000244140625. – Eric Postpischil Feb 07 '20 at 20:46
  • After reading your comments, I just started playing around with floats a little and got some mind-blowing (at least to me) results. 0.2 + 0.4 == 0.6 returned false! I understand this is because of limitations of IEEE-754 binary32 standard. How does one avoid pitfalls such as these? (0.4 + 0.2 == 0.6) Is there a safe way to perform floating point arithmetic and get expected (in the general mathematical sense) results? – rahul891 Feb 07 '20 at 21:08
  • 1
    See [my comment here](https://stackoverflow.com/questions/60083787/multiplication-issue-dividing-and-multiplying-by-the-same-decimal-does-not-retu#comment106285573_60083787). By and large, floating-point arithmetic is designed for approximating continuous functions. `x == y` is not a continuous function; it has a discontinuity at *x* = *y*. Generally, you should not be using floating-point arithmetic in situations where you need to test numbers for equality (and you should not attempt workarounds to make it work in such situations). – Eric Postpischil Feb 07 '20 at 21:31
  • If you want to do physics simulations with floating-point numbers, go ahead. If you want to do exact mathematics with real-number arithmetic, do not use floating-point. – Eric Postpischil Feb 07 '20 at 21:32

1 Answers1

1

Floating points are not precise, not every number can be represented exactly. The same way 1/3 can't be represented precisely with a finite decimal number, there are many numbers which can't represented exactly in an IEEE floating point format (which might be however representable in a decimal system).

In such a case, floating point operations, including rounding (to decimal places), will return a value depending on the chosen rounding mode, usually the nearest floating point number.

This number might be just slightly less than the actual decimal value with two decimal places, thus having alot of nines in decimal notation.

Sebastian Hoffmann
  • 2,815
  • 1
  • 12
  • 22
  • In a general sense, rounding-off to the n-th digit means dropping all digits to the right of the n-th place after decimal point. However, for floating points the rounding operation somehow seems a little counter-intuitive. Doesn't that defeat the purpose of "rounding" in these cases where there is not exact representation? Or am I missing something here? – rahul891 Feb 07 '20 at 14:26
  • 1
    Thinking that “Floating points are not precise” is not the correct model to use. Each floating-point datum that is not a NaN represents one number **exactly**. It does not imprecisely represent a number or a range of numbers. It is exactly one number. When a number, such as a decimal numeral for a number in a string, is converted to floating-point, the conversion is an operation whose result is **exactly** the number representable in floating-point closest to the input number. This is the same as for integers—when we convert 2.3 to integer and get 2, we do not think that 2 represents 2.3. – Eric Postpischil Feb 07 '20 at 14:32
  • 1
    Similarly, the floating-point number near 65.19, 65.19000244140625, is simply and exactly 65.19000244140625; it does not represent 65.19. Using a correct model in which each floating-point datum represents a number exactly and it is the operations that round to representable values is essential to be able to write proofs about floating-point code and to design floating-point code. – Eric Postpischil Feb 07 '20 at 14:34