1

I haven't been able to replicate this with a minimal example, but maybe I can try to explain it. I have a function like this:

import pandas as pd
def myfile():
   A = pd.read_csv('myfile.csv')
   [some processing]
   A.to_csv('myfile2.csv')
   return A

Now the problem is that if I do

t1 = myfile()
t2 = pd.read_csv('myfile2.csv')

they end up returning different results! I saved both t1 and t2 and did a diff on them, only to find that they different in the floating points, like this

2c2
< A,-61.54871999999999,-30.01167
---
> A,-61.54871999999997,-30.01167
5c5

Unfortunately, the saved version gives me the "correct" results. Why would the return values and the read_csv differ?

[There are similar question, but not exactly this: see here, for example]

Dervin Thunk
  • 19,515
  • 28
  • 127
  • 217
  • Maybe not exactly a duplicate, but I can't read a question which contrasts `-61.54871999999999` and `-61.54871999999997` without thinking that this is probably a disguised version of [is floating point math broken?](https://stackoverflow.com/q/588004/4996248). – John Coleman May 06 '20 at 01:33
  • @JohnColeman: I know what you mean, and you're right, the problem is that I'd just like for `t1` and `t2` to return the same DataFrame, that's all... – Dervin Thunk May 06 '20 at 01:39
  • 1
    My guess is that `to_csv` rounds a double to a certain precision (in base 10). This entails a loss of information, information which `read_csv` can't recover. That a typical csv is a lossy way of storing floating point data is a known issue. I don't fully understand the issue, but [this answer](https://stackoverflow.com/a/47368368/4996248) to a related question looks relevant. – John Coleman May 06 '20 at 01:41

0 Answers0