0

When I read a .csv file with pandas and then save it again the values in some rows are modified for example from 0.007 to 0.006999999999999999 . Here is example to better understand the problem:

d3 = pd.read_csv("third_table.csv", sep=';') # read the original csv
d3 = d3.loc[322:325] # I just selected these rows for this example because row 322 has this problem
d3['value'] = d3['value'].astype(float) # just to avoid questions about which type is the data in this column
print(d3[d3.value == 0.006999999999999999]) # returns empty dataframe
print(d3.iloc[0].value) # returns 0.07
d3.to_csv("~/Desktop/test.csv", sep=';', index=False) # saving to file

And then

d_test = pd.read_csv("~/Desktop/test.csv", sep=';') # reading newly created file
print(d_test[d_test.value == 0.006999999999999999]) # returns one row
print(d_test.iloc[0].value) # returns 0.006999999999999999

Why is this happening and how can I prevent pandas from doing this? Also the original file already contains around 200 rows with such long float values and when I tried to just simply read and then write to new file, the number of rows with problematic values decreased a little.

I have also checked the file directly and the new file for this particular case appears to be having correct 0.007 value written in it, so why does it read it incorrectly? (dtype of column 'value' in new file is float)

And also a following question. What is the best way to round such floats 0.006999999999999999 if I don't know upfront how many digits after the dot should be there(could be numbers 0.006999999999999999 and 0.06999999999999999 in the column).

Ronny Strom
  • 43
  • 1
  • 8
  • 4
    See if [Is floating point math broken?](https://stackoverflow.com/q/588004/5987) sheds any light. – Mark Ransom May 04 '20 at 01:52
  • Thanks, this did help to better understand what's going on here. So the best way to solve this would be just round all float to let's say 3 digits after coma, right? – Ronny Strom May 04 '20 at 02:02
  • Rounding would help, but it's difficult to know how to apply the correct rounding. You can count on a `float` to be accurate to around 16 total digits, but that's including digits on both sides of the decimal point. `print` applies some rounding so you can't count on it to give you an exact representation. – Mark Ransom May 04 '20 at 03:23

1 Answers1

1

When you write csv, you can add float_format='%.3f', like this d3.to_csv("~/Desktop/test.csv", sep=';', float_format='%.3f', index=False)

Another way to make it right is, to avoid using float. You can multiply the column with 1000, and divide by 1000 in the end.

vcycyv
  • 606
  • 7
  • 9