4

I have a dataframe which has multiple columns. I'd like to iterate through the columns, counting for each column how many null values there are and produce a new dataframe which displays the sum of isnull values alongside the column header names.

If I do:

for col in main_df:
    print(sum(pd.isnull(data[col])))

I get a list of the null count for each column:

0
1
100

What I'm trying to do is create a new dataframe which has the column header alongside the null count, e.g.

col1 | 0
col2 | 1
col3 | 100
Bellerofont
  • 1,081
  • 18
  • 17
  • 16
joshi123
  • 835
  • 2
  • 13
  • 33
  • dupe: http://stackoverflow.com/questions/26266362/how-to-count-the-nan-values-in-the-column-in-panda-data-frame – EdChum Jan 16 '17 at 20:31

3 Answers3

16

Try this:

In [71]: df
Out[71]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [72]: pd.isnull(df).sum()
Out[72]:
a    1
b    2
c    0
dtype: int64

or:

In [76]: df.isnull().sum()
Out[76]:
a    1
b    2
c    0
dtype: int64

you can create a DF out of it:

In [78]: df.isnull().sum().to_frame('nulls')
Out[78]:
   nulls
a      1
b      2
c      0
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
2

If the number of columns in your dataframe is greater than 10 you will end up with the middle columns being left out of the output. You can print every column using:

nulls = df.isnull().sum().to_frame()
for index, row in nulls.iterrows():
    print(index, row[0])
Pierre Delecto
  • 455
  • 1
  • 7
  • 26
2

for a certain dataset you could use to show you the missing values and the number

dataset.isnull().sum(axis = 0)
Simas Joneliunas
  • 2,890
  • 20
  • 28
  • 35