5

I am trying to replace the nan values in a dataframe column 'Functional' using fillna() function. The issues I am facing are below:

  1. I am able to detect the null values using the isnull()

dfcomp[dfcomp['Functional'].isnull()==True]

search for null values

  1. using above index I searched the actual value

dfcomp['Functional'][2216]

value search using the index

  1. but when I try to fill the nan using fillna(), nothing happens. Even after running the fillna statement I can rerun the first statement and see the same 2 nan instances.

dfcomp['Functional']=dfcomp['Functional'].fillna(value=dfcomp['Functional'].mode())

I have tried both versions btw

dfcomp['Functional'].fillna(value=dfcomp['Functional'].mode(),inplace=True)

The fillna()

  1. I also tried using the replace() function for this but no luck

dfcomp['Functional']=dfcomp['Functional'].replace({'nan':dfcomp['Functional'].mode()})

Is there something wrong with my code? why is fillna() not recognizing the nan when isnull() can do so? Also, why is the index search showing the value as nan but when I try to replace the same value using replace() there is no result?

How can I replace the nan values when my fillna() is not able to recognize it?

PVL
  • 51
  • 1
  • 1
  • 4
  • 1
    Hi PVL, welcome to SO. Images are typically discouraged for coding questions where the output could be copy and pasted into a formatted code block. You can look at https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples to see how to make your data more easily available/reproducible. – ALollz Oct 30 '19 at 18:30

3 Answers3

3

Essentially the problem is the return type of dfcomp['Functional'].mode() This a single element pandas.Series and the fillna() expects either a scalar or a dict/Series/DataFrame of the same len as the column you are trying to fill.

You need to calculate the mode of the column and then pass the scalar to the fillna() method.

mode = dfcomp['Functional'].mode().values[0]
dfcomp['Functional'].fillna(value=mode, inplace=True)
nickyfot
  • 1,932
  • 17
  • 25
  • Ok let me try that...but for columns with float values i was able to assign mode in the same way. --> ` for col in columns: if dfcomp[col].isnull().sum()<400: **dfcomp[col]=dfcomp[col].fillna(value=dfcomp[col].mode())** print(col," " ,dfcomp[col].isnull().sum()) else: print(col, ' ', 'Missing') dfcomp[col].fillna(value='Missing',inplace=True)` – PVL Oct 30 '19 at 18:42
  • It is hard to tell without actually seeing the data and the implementation, but it looks odd unless the mode dataframe of the float column happens to have the same number of rows as the original df – nickyfot Oct 31 '19 at 10:55
1

This is an Index alignment problem. pd.Series.mode always returns Series even if only one value is returned. The index of this Series is thus a RangeIndex (up to the number of values tied for the mode) and so when you use .fillna it tries to align on Index, which mostly doesn't align with your DataFrame.

You want to select the modal value so use .iloc

dfcomp['Functional'] = dfcomp['Functional'].fillna(dfcomp['Functional'].mode().iloc[0])

MCVE

import pandas as pd
import numpy as np

np.random.seed(42)
df = pd.DataFrame({'foo': np.random.choice([1,2,3,np.NaN], 7)})

df['foo'].mode()
#0    3.0
#dtype: float64

# Nothing gets filled because only the row with Index 0 could possibly
# be filled and it wasn't missing to begin with
df['foo'].fillna(df['foo'].mode())
#0    3.0
#1    NaN
#2    1.0
#3    3.0
#4    3.0
#5    NaN
#6    1.0
#Name: foo, dtype: float64

# This fills the `NaN` with 3 regardless of index
df['foo'].fillna(df['foo'].mode().iloc[0])
#0    3.0
#1    3.0
#2    1.0
#3    3.0
#4    3.0
#5    3.0
#6    1.0
#Name: foo, dtype: float64
ALollz
  • 57,915
  • 7
  • 66
  • 89
-1

In order to fill NaN values, you can use the following code:

dfcomp = dfcomp.fillna(value=0)

Later update:

dfcomp['Functional'] = dfcomp['Functional'].fillna(dfcomp['mode'])

Adrian B
  • 1,490
  • 1
  • 19
  • 31
  • He want's to fill with the mode of the column not 0. Plus it looks like he is trying to fillna in only one column not the entire data frame – nickyfot Oct 30 '19 at 18:26