How to count missing data in each column in python?

Question

I have a large data frame with 85 columns. The missing data has been coded as NaN. My goal is to get the amount of missing data in each column. So I wrote a for loop to create a list to get the amounts. But it does not work.

The followings are my codes:

headers = x.columns.values.tolist() 
nans=[]
for head in headers:
    nans_col = x[x.head == 'NaN'].shape[0]
    nan.append(nans_col)

I tried to use the codes in the loop to generate the amount of missing value for a specific column by changing head to that column's name, then the code works and gave me the amount of missing data in that column.

So I do not know how to correct the for loop codes. Is somebody kind to help me with this? I highly appreciate your help.

You've compared the entry to the string `'NaN`, which is not even the data type you need. Look up the `isnan` function and, ingeneral, how to detect `NaN` values. — Prune, Oct 18 '18 at 00:35
@Prune Thanks for your comments! I coded missing data as np.nan. Then isnull() works to find missing data. — vivian, Oct 18 '18 at 03:55

Ammar Sabir Cheema · Accepted Answer · 2018-10-18T02:00:41.103

For columns in pandas (python data analysis library) you can use:

In [3]: import numpy as np
In [4]: import pandas as pd
In [5]: df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
In [6]: df.isnull().sum()
Out[6]:
a    1
b    2
dtype: int64

For a single column or for sereis you can count the missing values as shown below:

In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: s = pd.Series([1,2,3, np.nan, np.nan])

In [4]: s.isnull().sum()
Out[4]: 2

Reference

score 1 · Answer 2 · answered Jan 24 '20 at 20:53

1

This gives you a count (by column name) of the number of values missing (printed as True followed by the count)

missing_data = df.isnull()
for column in missing_data.columns.values.tolist():
    print(column)
    print(missing_data[column].value_counts())
    print("")

answered Jan 24 '20 at 20:53

bbarnes8

11
1

Very Nice!, says Borat – mccurcio Jan 08 '23 at 04:23

score 1 · Answer 3 · answered Oct 07 '20 at 03:11

Just use Dataframe.info, and non-null count is probably what you want and more.

>>> pd.DataFrame({'a':[1,2], 'b':[None, None], 'c':[3, None]}) \
.info(verbose=True, null_counts=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   a       2 non-null      int64    
 1   b       0 non-null      object
 2   c       1 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 176.0+ bytes

if you're getting `'Series' object has no attribute 'info'` for a single column, try this `df['a'].isna().sum()` — PatrickT, Nov 14 '21 at 07:39

score 0 · Answer 4 · edited Jun 20 '20 at 09:12

If there are multiple dataframe below is the function to calculate number of missing value in each column with percentage

Missing Data Analysis

def miss_data(df):
    x = ['column_name','missing_data', 'missing_in_percentage']
    missing_data = pd.DataFrame(columns=x)
    columns = df.columns
    for col in columns:
        icolumn_name = col
        imissing_data = df[col].isnull().sum()
        imissing_in_percentage = (df[col].isnull().sum()/df[col].shape[0])*100
        
        missing_data.loc[len(missing_data)] = [icolumn_name, imissing_data, imissing_in_percentage]
    print(missing_data)

stumbled across this function, was looking for something like this, not working for me. — Ricky, Sep 07 '21 at 10:18

score 0 · Answer 5 · answered Sep 04 '20 at 16:22

0

#function to show the nulls total values per column
colum_name = np.array(data.columns.values)
def iter_columns_name(colum_name):
  for k in colum_name:
    print("total nulls {}=".format(k),pd.isnull(data[k]).values.ravel().sum())

#call the function
iter_columns_name(colum_name)

#outout
total nulls start_date= 0
total nulls end_date= 0
total nulls created_on= 0
total nulls lat= 9925
.
.
.

answered Sep 04 '20 at 16:22

jhoanmartinezz

11
1

This can be done without df looping, using shade and count, or isnull – Itamar Sep 04 '20 at 21:33

How to count missing data in each column in python?

5 Answers5

Missing Data Analysis