correcting for floating point arithmetic 'errors' when rounding in pandas

Question

I have a number that I have to deal with that I hate (and I am sure there are others).

It is

a17=0.0249999999999999
a18=0.02499999999999999

Case 1:

round(a17,2) gives 0.02
round(a18,2) gives 0.03

Case 2:

round(a17,3)=round(a18,3)=0.025

Case 3:

round(round(a17,3),2)=round(round(a18,3),2)=0.03

but when these numbers are in a data frame...

Case 4:

df=pd.DataFrame([a17,a18])

np.round(df.round(3),2)=[0.02, 0.02]

Why are the answers I get are the same as in Case 1?

df=pd.DataFrame([a17,a18]) df.round(2) =0.02 but df.round(3)=0.025 but np.round(df.round(3),2)=0.02 — Mark Dranias, Jan 04 '18 at 15:09

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

When you are working with floats - you will be unable to get EXACT value, but only approximated in most cases. Because of the in-memory organization of floats.

You should keep in mind, that when you print float - you always print approximated decimal!!!
And this is not the same.

Exact value will be only 17 digits after '.' in 0.xxxx

That is why:

>>> round(0.0249999999999999999,2)
0.03
>>> round(0.024999999999999999,2)
0.02

This is true for most of programming languages (Fortran, Python, C++ etc)

Let us look into fragment of Python documentation:

(https://docs.python.org/3/tutorial/floatingpoint.html)

0.0001100110011001100110011001100110011001100110011...

Stop at any finite number of bits, and you get an approximation. On most machines today, floats are approximated using a binary fraction with the numerator using the first 53 bits starting with the most significant bit and with the denominator as a power of two. In the case of 1/10, the binary fraction is 3602879701896397 / 2 ** 55 which is close to but not exactly equal to the true value of 1/10.

Many users are not aware of the approximation because of the way values are displayed. Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine. On most machines, if Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display

>>>0.1
0.1000000000000000055511151231257827021181583404541015625

That is more digits than most people find useful, so Python keeps the number of digits manageable by displaying a rounded value instead

>>>1 / 10
0.1

Just remember, even though the printed result looks like the exact value of 1/10, the actual stored value is the nearest representable binary fraction.

Interestingly, there are many different decimal numbers that share the same nearest approximate binary fraction. For example, the numbers 0.1 and 0.10000000000000001 and 0.1000000000000000055511151231257827021181583404541015625 are all approximated by 3602879701896397 / 2 ** 55. Since all of these decimal values share the same approximation, any one of them could be displayed while still preserving the invariant eval(repr(x)) == x.

Let us look into fragment of NumPy documentation:

(https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.around.html#numpy.around)
For understanding - np.round uses np.around - see NumPy documentation

For values exactly halfway between rounded decimal values, NumPy rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0, -0.5 and 0.5 round to 0.0, etc. Results may also be surprising due to the inexact representation of decimal fractions in the IEEE floating point standard [R9] and errors introduced when scaling by powers of ten.

Conclusions:

In your case np.round just rounded 0.025 to 0.02 by rules described above (source - NumPy documentation)

THANKS for the insight! The last point is what is driving me nuts though-- why cant I round in a dataframe? when I try to round the dataframe it cant be done e.g. df=pd.DataFrame([a17,a18]) then df.round(2) =0.02 but df.round(3)=0.025 but np.round(df.round(3),2)=0.02 — Mark Dranias, Jan 04 '18 at 15:04
I added fragment of NumPy documentation and conclusions - to help you understand. The cause - just different rounding rules in python and numpy. — Muritiku, Jan 04 '18 at 15:32
Pardon me - I could describe it in the beginning. I hope that I answered all of your questions about current topic. — Muritiku, Jan 04 '18 at 15:47
I think it also relates to a bug in numpy round. for instance np.round(0.024999,3) wont round to 3 decimals on my computer. It gives 0.025000000000001 — Mark Dranias, Jan 04 '18 at 16:06
Mark, this is not a bug. :) Float in memory is one thing, result of print - is not a float - this is approximated float by double. Just read python floats documentation in answer (and by link) carefully. :) — Muritiku, Jan 04 '18 at 16:11
I will meditate on the keywords there.....but my initial reaction wasif a program is supposed to return a value rounded to three decimals and does not... — Mark Dranias, Jan 04 '18 at 17:16
@MarkDranias, as I wrote in my answer - 'round' in python, and np.round - work not the same. np.round(df.round(3),2) = np.round([0.025, 0.025],2) = [0.02, 0.02] => Everything correct. Because np.round(0.025,2) = 0.02 (np.round - rounding to the lowest int in that case - look into conclusions in the answer) — Muritiku, Jan 05 '18 at 20:01

correcting for floating point arithmetic 'errors' when rounding in pandas

1 Answers1

Let us look into fragment of Python documentation:

Let us look into fragment of NumPy documentation:

Conclusions:

Linked