0

So yesterday I had a problem where when I tried splitting one large .csv file into several .csv files based on a date using Pandas and many of those files had an issue where when a tax_rate value would be equal to 6.712 on the original file it would become 6.712000000000001.

When reading the original .csv file I specified that the tax_rate column should be of a float type and I haven't done any edits to that column whatsoever.

Here are the original values: original_values

Here you can see the problem: the problem

Seanny123
  • 8,776
  • 13
  • 68
  • 124
Edgg
  • 17
  • 1
  • please check here on python float management system: https://docs.python.org/3/tutorial/floatingpoint.html – Artyom Akselrod Mar 16 '21 at 07:43
  • The value 6.712 cannot be accurately represented in floating point because it is not a sum of powers of 2. There is no way to "prevent the problem" because that is the way floating point works. This is not a`pandas` thing or a Python thing. Python can represent such numbers accurately using `decimal.Decimal` but `pandas` needs `float`s to do what it does efficiently. – BoarGules Mar 16 '21 at 08:52
  • @Edgg: Some more context would be useful (e.g., code that we can use to reproduce the issue), but I suspect you may be running into something like this: https://stackoverflow.com/q/47368296/270986. Using `float_precision='round_trip'` in your `read_csv` or `read_table` (if you're actually using `read_csv` or `read_table`) may help – Mark Dickinson Mar 16 '21 at 10:46
  • @BoarGules: There's likely something more than float imprecision going on here. Pandas "optimizes" reading of floating-point values, but in doing so it sacrifices accuracy: the conversion from the decimal string in the file to the floating-point value is no longer correctly rounded. As a result, writing a value out to disk and then re-reading can result in getting a slightly different value back. (In contrast, Python's own `repr` is designed so that `float(repr(x))` _does_ always recover the value of `x`.) `float_precision='round_trip'` tells Pandas to use a slower, more accurate conversion. – Mark Dickinson Mar 16 '21 at 10:49
  • I appreciate everyone's replies. @MarkDickinson using float_precision='round_trip' seemed to have help with my issue, thank you very much! – Edgg Mar 16 '21 at 12:26

0 Answers0