3

I have a csv file containing some float data. the code is simple

df = pd.read_csv(my_csv_vile)
print(df.iloc[:2,:4]
600663.XSHG  000877.XSHE  600523.XSHG  601311.XSHG
2016-01-04 09:31:00        49.40         8.05        22.79        21.80
2016-01-04 09:32:00        49.55         8.03        22.79        21.75

then I convert it to float32 to save memory usage.

short_df = df.astype(np.float32)
print(short_df.iloc[:2,:4])
600663.XSHG  000877.XSHE  600523.XSHG  601311.XSHG
2016-01-04 09:31:00    49.400002         8.05    22.790001    21.799999
2016-01-04 09:32:00    49.549999         8.03    22.790001    21.750000

the value just changed! How could I keep the data unchanged?

(I also tried short_df.round(2),but print still get the same output)

shaik moeed
  • 5,300
  • 1
  • 18
  • 54
user1871453
  • 109
  • 1
  • 2
  • 6

1 Answers1

5

Many decimal floating point numbers can not be accurately represented with a float64 or float32. Review e.g. The Floating-Point Guide if you are unfamiliar with that issue.

Pandas defaults to displaying floating points with a precision of 6, and trailing 0s are dropped in the default output.

float64 can accurately represent the example numbers up to (and beyond) precision 6, whereas float32 can not:

>>> print("%.6f" % np.float64(49.40))
49.400000

>>> print("%.6f" % np.float32(49.40))
49.400002

If you are not interested in the precision beyond the 2nd digit when printing the df, you can set the display precision:

pd.set_option('precision', 2)

Then you get the same output even with float32s:

 >>> df.astype(np.float32)
                     600663.XSHG  000877.XSHE  600523.XSHG  601311.XSHG
2016-01-04 09:31:00        49.40         8.05        22.79        21.80
           09:32:00        49.55         8.03        22.79        21.75

If you want to drop everything beyond the 2nd digit when writing back the csv file, use float_format:

df.to_csv(file_name, float_format="%.2f")
w-m
  • 10,772
  • 1
  • 42
  • 49