0

I've just started learning pandas and noticed a very strange behaviour, reading and writing csv files changes the values of the cells of the data frame.

before:

64437311025 SMP 1   110.00  0.00    498.00  4174.3865   4243.59 4247.69 4424.62 4570.26 3874.36 4516.41 4412.31 4117.44 4215.38 4300.00 4433.85 4065.64 4394.36 1728.00 1675.00 1517.27 1363.23 0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0   0   0   0   0   0   0   0   0   0.00    0   0.00

after issuing:

df = pd.read_csv(in_file, sep='\t')
df.to_csv(out_file, sep='\t')

I get:

1   64437311025 SMP 1   110.0   0.0 498.0   4174.3864999999996  4243.5900000000001  4247.6899999999996  4424.6199999999999  4570.2600000000002  3874.3600000000001  4516.4099999999999  4412.3100000000004  4117.4399999999996  4215.3800000000001  4300.0  4433.8500000000004  4065.6399999999999  4394.3599999999997  1728.0  1675.0  1517.27 1363.23 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0   0.0 0   0.0

I'd be grateful for any suggestions as to what I'm doing wrong. I'm using standard installation of Python (2.7.3) and pandas from ubuntu 12.10 repositories (0.8.0)

EDIT: i think it is a bug: https://github.com/pydata/pandas/issues/2069 thanks to user1827356 I found the float_format argument to to_csv method, but to make it work I had to install newer version of pandas, since it was not working in default 0.8 pandas in ubuntu 12.10. it's ok now. thanks!

yemu
  • 26,249
  • 10
  • 32
  • 29

2 Answers2

1

What you see in your output csv are the same values but with higher precision.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271
1

for example:

df.to_csv('pandasfile.csv', float_format='%.3f')