Floats value has changed when spliting a DataFrame

Question

So yesterday I had a problem where when I tried splitting one large .csv file into several .csv files based on a date using Pandas and many of those files had an issue where when a tax_rate value would be equal to 6.712 on the original file it would become 6.712000000000001.

When reading the original .csv file I specified that the tax_rate column should be of a float type and I haven't done any edits to that column whatsoever.

Here are the original values: original_values

Here you can see the problem:

please check here on python float management system: https://docs.python.org/3/tutorial/floatingpoint.html — Artyom Akselrod, Mar 16 '21 at 07:43
The value 6.712 cannot be accurately represented in floating point because it is not a sum of powers of 2. There is no way to "prevent the problem" because that is the way floating point works. This is not a`pandas` thing or a Python thing. Python can represent such numbers accurately using `decimal.Decimal` but `pandas` needs `float`s to do what it does efficiently. — BoarGules, Mar 16 '21 at 08:52
@Edgg: Some more context would be useful (e.g., code that we can use to reproduce the issue), but I suspect you may be running into something like this: https://stackoverflow.com/q/47368296/270986. Using `float_precision='round_trip'` in your `read_csv` or `read_table` (if you're actually using `read_csv` or `read_table`) may help — Mark Dickinson, Mar 16 '21 at 10:46
@BoarGules: There's likely something more than float imprecision going on here. Pandas "optimizes" reading of floating-point values, but in doing so it sacrifices accuracy: the conversion from the decimal string in the file to the floating-point value is no longer correctly rounded. As a result, writing a value out to disk and then re-reading can result in getting a slightly different value back. (In contrast, Python's own `repr` is designed so that `float(repr(x))` _does_ always recover the value of `x`.) `float_precision='round_trip'` tells Pandas to use a slower, more accurate conversion. — Mark Dickinson, Mar 16 '21 at 10:49
I appreciate everyone's replies. @MarkDickinson using float_precision='round_trip' seemed to have help with my issue, thank you very much! — Edgg, Mar 16 '21 at 12:26

Floats value has changed when spliting a DataFrame

0 Answers0