I'm trying to figure out if this is a issue or I am doing something wrong.
Im trying to compare two dataframes of one row each one. And the comparison fails when compare the NaN fields.
e.g. I have this two rows. The first one is the original one, and the second one comes from doing some filtering but at the end is the same row.
The initial one.
> df.loc[3968]
col1 Name
col2 201800000953
col3 0000002
col4 Please disconnec...
col5 NaN
[more columns]
Name: 3968, dtype: object
The same after some filtering.
> df2.head(1)
col1 Name
col2 201800000953
col3 0000002
col4 Please disconnec...
col5 NaN
[more columns]
Name: 3968, dtype: object
This are the types
> type(df2.head(1).col5.values[0])
numpy.float64
> type(df.loc[3968].col5)
numpy.float64
> df.dtypes
col1 object
col2 int64
col3 int64
col4 object
col5 float64
> df2.dtypes
col1 object
col2 int64
col3 int64
col4 object
col5 float64
This is what I've got when compare
> df2.head(1).equals(df.loc[3968])
False
> df2.iloc[0].equals(df.loc[3968])
True
> df2.head(1) == df.loc[3968]
col1 True
col2 True
col3 True
col4 True
col5 False
Name: 3968, dtype: bool
> df2.iloc[0] == df.loc[3968]
col1 True
col2 True
col3 True
col4 True
col5 False
Name: 3968, dtype: bool
As you can see, it fails when the values are NaN. But if I replace NaN values for something, it doesn't fails.
> df2.fillna(0).iloc[0].equals(df.loc[3968].fillna(0))
True
> df2.head(1).fillna(0) == df.loc[3968].fillna(0)
col1 True
col2 True
col3 True
col4 True
col5 True
Name: 3968, dtype: bool
> df2.iloc[0].fillna(0) == df.loc[3968].fillna(0)
col1 True
col2 True
col3 True
col4 True
col5 True
Name: 3968, dtype: bool
Why the behavior is changing? Is it not the same use .head(1) and .iloc[0]?
I thought it could be that .head() cast NaN values to str. But it is not because
> type(df2.head(1).col5.values[0])
numpy.float64