1

I'm writing a unit-test that should compare a saved CSV known results with processed results. My test failed although the result data were equal. My guess was that Panndas rounds the value in some way, so I've created the following snippet to test my speculation:

import pandas as pd
df = pd.DataFrame({'val':[-0.41676538151302184]})
df.to_csv('tmp.csv',index=False)
# Load the saved CSV and compare it
df2 = pd.read_csv('./tmp.csv')
df2.val.compare(df.val)

       self     other
0 -0.416765 -0.416765

Pandas shows the differences, although values appear to be equal. If I round the values, comparison successes.

What would be the right way to compare saved data to the calculated one?

Shlomi Schwartz
  • 8,693
  • 29
  • 109
  • 186

1 Answers1

2

The issue with floating point numbers is precision. As you guessed, your numbers are very close but not exactly identical:

df.iloc[0,0]
-0.41676538151302184

df2.iloc[0,0]
-0.4167653815130218
with pd.option_context('display.float_format', '{:.20f}'.format):
    display(df2.val.compare(df.val))

                     self                   other
0 -0.41676538151302178203 -0.41676538151302183755

One option is to use numpy.isclose or numpy.allclose, that are specifically designed to test close numbers. There are two parameters rtol and atol to specify a custom relative or absolute tolerance.

import numpy as np
np.isclose(df, df2).all()

# or 
np.allclose(df, df2)

output: True

mozway
  • 194,879
  • 13
  • 39
  • 75
  • 1
    Using `pd.read_csv(..., float_precision='round_trip')` keeps the last decimal if the precision is required. – Lucas Jan 13 '22 at 08:53
  • 1
    Can you add `with pd.option_context('display.float_format', '{:.20f}'.format): display(df2.val.compare(df.val))`? (and congrats for your 40k) – Corralien Jan 13 '22 at 08:58