0

In a particular dataframe I have a column called "Wind" giving me the wind energy production per year in Germany. At the beginning of the sequence the production is so small that is written NaN in the DF, no data available. Only from 2010 on I have data for wind.

DATA link for copy and paste: API_link_to_data='https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv'

energyDF = pd.read_csv(API_link_to_data)

now the the following code compares two columns element wise:

energyDF.loc[:,'Wind'] == energyDF['Wind']

I expected the result to be [True, True, True,....,True]

but this is not the case. For the NaN values the result is False.

Well the result is false for all the NaN values, even if element wise they are the same:

print(wind_col1[0])
print(wind_col2[0])
print(wind_col1[0] == wind_col2[0])
print(wind_col1[0] == np.nan)
print(wind_col2[0] == np.nan)

Result: nan nan False False False

Expected: nan nan True True True

and after doing to the whole frame:

energyDF=energyDF.fillna(0)

then

energyDF.loc[:,'Wind'] == energyDF['Wind']

is a list full of Trues.

Could someone explain that?

Thanks

JFerro
  • 3,203
  • 7
  • 35
  • 88

1 Answers1

0

NaNs are not equal to themselves. See: Why is NaN not equal to NaN?

As for checking equality of energyDF.loc[:,'Wind'] == energyDF['Wind']

you could fillna both sides with a value (preferably one that doesn't occur in the series) and then check that both are indeed identical

as an example:

>>> df
    ID Col1
0  1.0   AD
1  NaN   BC
2  3.0   CE
>>> (df.loc[:, 'ID'] == df['ID']).all()
False
>>> (df.loc[:, 'ID'].fillna("Non-existent") == df['ID'].fillna("Non-existent")).all()
True
Asish M.
  • 2,588
  • 1
  • 16
  • 31