0

I'm trying to figure out if this is a issue or I am doing something wrong.

Im trying to compare two dataframes of one row each one. And the comparison fails when compare the NaN fields.

e.g. I have this two rows. The first one is the original one, and the second one comes from doing some filtering but at the end is the same row.

The initial one.

> df.loc[3968]
col1                                       Name
col2                               201800000953
col3                                    0000002
col4                        Please disconnec...
col5                                        NaN
[more columns]
Name: 3968, dtype: object

The same after some filtering.

> df2.head(1)
col1                                        Name
col2                                201800000953
col3                                     0000002
col4                         Please disconnec...
col5                                         NaN
[more columns]
Name: 3968, dtype: object

This are the types

> type(df2.head(1).col5.values[0])    
numpy.float64

> type(df.loc[3968].col5)
numpy.float64

> df.dtypes    
col1                         object
col2                          int64
col3                          int64
col4                         object
col5                        float64

> df2.dtypes    
col1                         object
col2                          int64
col3                          int64
col4                         object
col5                        float64

This is what I've got when compare

> df2.head(1).equals(df.loc[3968])
False

> df2.iloc[0].equals(df.loc[3968])
True

> df2.head(1) == df.loc[3968]
col1                         True
col2                         True
col3                         True
col4                         True
col5                         False
Name: 3968, dtype: bool

> df2.iloc[0] == df.loc[3968]
col1                         True
col2                         True
col3                         True
col4                         True
col5                        False
Name: 3968, dtype: bool

As you can see, it fails when the values are NaN. But if I replace NaN values for something, it doesn't fails.

> df2.fillna(0).iloc[0].equals(df.loc[3968].fillna(0))
True

> df2.head(1).fillna(0) == df.loc[3968].fillna(0)
col1                        True
col2                        True
col3                        True
col4                        True
col5                        True
Name: 3968, dtype: bool

> df2.iloc[0].fillna(0) == df.loc[3968].fillna(0)
col1                        True
col2                        True
col3                        True
col4                        True
col5                        True
Name: 3968, dtype: bool

Why the behavior is changing? Is it not the same use .head(1) and .iloc[0]?

I thought it could be that .head() cast NaN values to str. But it is not because

> type(df2.head(1).col5.values[0])    
numpy.float64  
Ricardohs
  • 43
  • 1
  • 8
  • have a look at [this](https://stackoverflow.com/questions/19322506/pandas-dataframes-with-nans-equality-comparison) – Adithya Dec 18 '18 at 12:34
  • yeah, I have read it before. But in that post there is nothing about NaN behavior when using .head() or .loc[] – Ricardohs Dec 18 '18 at 15:18
  • NaN behavior is always the same, `NaN==NaN` always gives `False`. see [this](https://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-comparisons-returning-false-for-ieee754-nan-values) for more information. – Adithya Dec 19 '18 at 04:48

0 Answers0