How to get the frequency of NaN obsevations in a pandas dataframe column

Question

I have a pandas data frame with 83 columns and 4000 rows. I intend to use the data for a logistic regression and therefore want to narrow down my columns to those that have the least amount of missing data.

To do this I was thinking of ranking them based on the frequency of NaN observations. I tried a few things like

econ_balance["BG.GSR.NFSV.GD.ZS"].describe()
econ_balance["BG.GSR.NFSV.GD.ZS"].value_counts
econ_balance["BG.GSR.NFSV.GD.ZS"]["NaN"]
econ_balance["BG.GSR.NFSV.GD.ZS"][NaN]

None of which seem to work. I always tried googling to see if this question has been answered before but no luck.

Thanks in advance for the help

Josh

df.isnull().sum() http://stackoverflow.com/questions/26266362/how-to-count-the-nan-values-in-the-column-in-panda-data-frame — Liam Foley, Apr 01 '15 at 19:13

EdChum · Accepted Answer · 2015-04-01T19:16:46.510

4

If you're looking just to count the NaN values:

In [2]:

df = pd.DataFrame({'a':[0,1,np.NaN,np.NaN,np.NaN],'b':np.NaN, 'c':[np.NaN,1,2,3,np.NaN]})
df
Out[2]:
    a   b   c
0   0 NaN NaN
1   1 NaN   1
2 NaN NaN   2
3 NaN NaN   3
4 NaN NaN NaN
In [6]:

df.isnull().astype(int).sum()
Out[6]:
a    3
b    5
c    2
dtype: int64

EDIT @CTZhu has pointed out the type casting is unnecessary:

In [7]:

df.isnull().sum()
Out[7]:
a    3
b    5
c    2
dtype: int64

edited Apr 01 '15 at 19:16

answered Apr 01 '15 at 19:14

EdChum

376,765
198
813
562

1

beat me to it, btw, I think you can skip the `astype(int)` part. – CT Zhu Apr 01 '15 at 19:15

How to get the frequency of NaN obsevations in a pandas dataframe column

1 Answers1