How do I read this assertion error for comparing two pandas series?

Question

I'm writing a unit test that tests a the output of a function that makes corrections to values. One of the unit tests is giving an assert False response, even though the output seems to be the exactly the same as expected. Can someone see what I'm missing?

This is the error message:

E       assert False
E        +  where False = <bound method NDFrame.equals of 0    29.0\n1    44.0\n2    56.0\nName: interest2, dtype: float64>(0    29.0\n1    44.0\n2    56.0\nName: interest2, dtype: float64)
E        +    where <bound method NDFrame.equals of 0    29.0\n1    44.0\n2    56.0\nName: interest2, dtype: float64> = 0    29.0\n1    44.0\n2    56.0\nName: interest2, dtype: float64.equals

test_func.py:50: AssertionError

This is the input data:

pd.DataFrame({'interest2': ['0.29','0.44','0.56']})

This is the expected output data:

pd.DataFrame({'interest2': ['29','44','56']})

This is the unit test:

def test_correct_input_values_interest2(input, output):
    x = formatData()
    output_a = x.correct_values(input['interest2'])
    dummy_output = output['interest2']
    dummy_output = dummy_output.astype(float)
    assert output_a.equals(dummy_output)

This is the output:

0    29.0
1    44.0
2    56.0
Name: interest2, dtype: float64
0    29.0
1    44.0
2    56.0
Name: interest2, dtype: float64

This is the metadata of the output:

<class 'pandas.core.series.Series'>
RangeIndex: 3 entries, 0 to 2
Series name: interest2
Non-Null Count  Dtype  
--------------  -----  
3 non-null      float64
dtypes: float64(1)
memory usage: 152.0 bytes
None
<class 'pandas.core.series.Series'>
RangeIndex: 3 entries, 0 to 2
Series name: interest2
Non-Null Count  Dtype  
--------------  -----  
3 non-null      float64
dtypes: float64(1)
memory usage: 152.0 bytes
None

This is the function being tested:

def correct_values(self, column: pd.Series):
        # if a value is over 100, replace value with value / 100
        column = column.apply(
            lambda x: float(x) if float(x) < 100 else float(x) / 100
        )

        # if all values are less than 1, multiply all values by 100
        value_validation = pd.unique(column < 1)

        # if the length is 1, all values are True or all values are False
        if len(value_validation) == 1 and value_validation in [True]:
            column = column.apply(
                lambda x: x * 100
            )

        # convert column to float
        column = column.astype(float)
        
        return column

float(0.29) * 100 != 29.0. In fact, float(0.29) != 0.29. Simply because converting to float truncates the number to 53 significant binary digits. — Mad Physicist, Aug 27 '22 at 16:33

score 1 · Accepted Answer · answered Aug 27 '22 at 16:27

The problem here is in the comparison of floats. Due to precision problems during the calculations, the floats are not completely equal. If you examine them, you will see something like:

dummy_ouput: (0, 29.0) (1, 44.0) (2, 56.0)
output_a:    (0, 28.999999999999996) (1, 44.0) (2, 56.00000000000001)

Generally, it is never a good idea to compare floats without providing a tolerance. In the case, of pandas.Series, there is already a convenience function that does the comparison, pandas.testing.assert_series_equal. Per default, it compares floats with a tolerance of 1e-5(which is sufficient in most cases), but you can change it using the rtol parameter.

Here's your corrected test that should pass:

def test_correct_input_values_interest2(input, output):
    x = formatData()
    output_a = x.correct_values(input['interest2'])
    dummy_output = output['interest2']
    dummy_output = dummy_output.astype(float)
    pd.testing.assert_series_equal(output_a, dummy_output)

How do I read this assertion error for comparing two pandas series?

1 Answers1