Pandas - compare loaded data to processed data

Question

I'm writing a unit-test that should compare a saved CSV known results with processed results. My test failed although the result data were equal. My guess was that Panndas rounds the value in some way, so I've created the following snippet to test my speculation:

import pandas as pd
df = pd.DataFrame({'val':[-0.41676538151302184]})
df.to_csv('tmp.csv',index=False)
# Load the saved CSV and compare it
df2 = pd.read_csv('./tmp.csv')
df2.val.compare(df.val)

       self     other
0 -0.416765 -0.416765

Pandas shows the differences, although values appear to be equal. If I round the values, comparison successes.

What would be the right way to compare saved data to the calculated one?

Does [this](https://stackoverflow.com/questions/47368296/pandas-read-csv-file-with-float-values-results-in-weird-rounding-and-decimal-dig) help? — Lucas, Jan 13 '22 at 08:49
Yes thanks, alot ,float_precision='round_trip' is a good solution — Shlomi Schwartz, Jan 13 '22 at 08:54

mozway · Accepted Answer · 2022-01-13T09:04:12.613

2

The issue with floating point numbers is precision. As you guessed, your numbers are very close but not exactly identical:

df.iloc[0,0]
-0.41676538151302184

df2.iloc[0,0]
-0.4167653815130218

with pd.option_context('display.float_format', '{:.20f}'.format):
    display(df2.val.compare(df.val))

                     self                   other
0 -0.41676538151302178203 -0.41676538151302183755

One option is to use numpy.isclose or numpy.allclose, that are specifically designed to test close numbers. There are two parameters rtol and atol to specify a custom relative or absolute tolerance.

import numpy as np
np.isclose(df, df2).all()

# or 
np.allclose(df, df2)

output: True

edited Jan 13 '22 at 09:04

answered Jan 13 '22 at 08:51

mozway

194,879
13
39
75

1

Using `pd.read_csv(..., float_precision='round_trip')` keeps the last decimal if the precision is required. – Lucas Jan 13 '22 at 08:53
1

Can you add `with pd.option_context('display.float_format', '{:.20f}'.format): display(df2.val.compare(df.val))`? (and congrats for your 40k) – Corralien Jan 13 '22 at 08:58

Pandas - compare loaded data to processed data

1 Answers1