4

I am trying to find the null values in a DataFrame. Though I reviewed the following post from Stackoverflow that describes the process to determine the null values, I am having a hard time to do the same for my dataset.

How to count the Nan values in the column in Panda Data frame

Working code:

import pandas as pd
a = ['america','britain','brazil','','china','jamaica'] #I deliberately introduce a NULL value
a = pd.DataFrame(a)
a.isnull()

#Output: 
False
1  False
2  False
3  False
4  False
5  False

a.isnull().sum()
#Output
#0    0
#dtype: int64

What am I doing wrong?

Community
  • 1
  • 1

3 Answers3

1

The '' in your list isn't a null value, it's an empty string. To get a null, use None instead. This is described in the pandas.isnull() documentation that missing values are "NaN in numeric arrays, [or] None/NaN in object arrays".

import pandas as pd
a = ['america','britain','brazil',None,'china','jamaica']
a = pd.DataFrame(a)
a.isnull()

       0
0  False
1  False
2  False
3   True
4  False
5  False

You can see the difference by printing the two dataframes. In the first case, the dataframe looks like:

pd.DataFrame(['america','britain','brazil',None,'china','jamaica'])

         0
0  america
1  britain
2   brazil
3         
4    china
5  jamaica

Notice that the value at index 3 is an empty string.

In the second case, you get:

pd.DataFrame(['america','britain','brazil',None,'china','jamaica'])

         0
0  america
1  britain
2   brazil
3     None
4    china
5  jamaica
Craig
  • 4,605
  • 1
  • 18
  • 28
  • "Unlike standard Python, an empty string in pandas isn't considered a null value." I don't think this is correct. The empty string is still a string! – Andy Hayden May 06 '17 at 05:29
  • @AndyHayden That line was poor wording on my part. I was trying to point out the difference between null values and things that test as False. – Craig May 06 '17 at 17:36
1

If you want '', None and NaN to all count as null, you can use the applymap method on each value in the dataframe coerced to a boolean and then use .sum subsequently:

import pandas as pd
import numpy as np


a = ['america','britain','brazil',None,'', np.nan, 'china','jamaica'] #I deliberately introduce a NULL value
a = pd.DataFrame(a)
a.applymap(lambda x: not x or pd.isnull(x)).sum()

# 0    3
# dtype: int64

I hope this helps.

Abdou
  • 12,931
  • 4
  • 39
  • 42
0

The other posts addressed that '' is not a null value and therefore isn't counted as such with the isnull method...

...However, '' does evaluate to False when interpreted as a bool.

a.astype(bool)

       0
0   True
1   True
2   True
3  False
4   True
5   True

This might be useful if you have '' in your dataframe and want to process it this way.

piRSquared
  • 285,575
  • 57
  • 475
  • 624