I've been trying to replace missing values in a Pandas dataframe, but without success. I tried the .fillna
method and also tried to loop through the entire data set, checking each cell and replacing NaNs with a chosen value. However, in both cases, Python executes the script without throwing up any errors, but the NaN values remain.
When I dug a bit deeper, I discovered behaviour that seems erratic to me, best demonstrated with an example:
In[ ] X['Smokinginpregnancy'].head()
Out[ ]
Index
E09000002 NaN
E09000003 5.216126
E09000004 10.287496
E09000005 3.090379
E09000006 6.080041
Name: Smokinginpregnancy, dtype: float64
I know for a fact that the first item in this column is missing and pandas recognises it as NaN. In fact, if I call this item on its own, python tells me it's NaN:
In [ ] X['Smokinginpregnancy'][0]
Out [ ]
nan
However, when I test whether it's NaN, python returns False.
In [ ] X['Smokinginpregnancy'][0] == np.nan
Out [ ] False
I suspect that when .fillna
is being executed, python checks whether the item is NaN but gets back a False, so it continues, leaving the cell alone.
Does anyone know what's going on? Any solutions? (apart from opening the csv file in excel and then manually replacing the values.)
I'm using Anaconda's Python 3 distribution.