1

I am trying to write a function that will check a specified column for nulls, within a group in a dataframe. The example dataframe has two columns, ID and VALUE. Multiple rows exist per ID. I want to know if ANY of the rows for a particular ID have a NULL value in VALUE.

I have tried building the function with iterrows().

df = pd.DataFrame({'ID':[1,2,2,3,3,3],
                   'VALUE':[50,None,30,20,10,None]})

def nullValue(col):
for i, row in col.iterrows():
    if ['VALUE'] is None:
        return 1
    else:
        return 0

df2 = df.groupby('ID').apply(nullVALUE)
df2.columns = ['ID','VALUE','isNULL']
df2

I am expecting to retrieve a dataframe with three columns, ID, VALUE, and isNULL. If any row in a grouped ID has a null, all of the rows for that ID should have a 1 under isNull.

Example:

ID  VALUE     isNULL
1   50.0      0
2   NaN       1
2   30.0      1
3   20.0      1
3   10.0      1
3   NaN       1
Fungui
  • 49
  • 4

1 Answers1

0

A quick solution, borrowed partially from this answer is to use groupby with transform:

df = pd.DataFrame({'ID':[1,2,2,3,3,3,3],
                   'VALUE':[50,None,None,30,20,10,None]})


df['isNULL'] = (df.VALUE.isnull().groupby([df['ID']]).transform('sum') > 0).astype(int)

Out[51]: 
   ID  VALUE  isNULL
0   1   50.0       0
1   2    NaN       1
2   2    NaN       1
3   3   30.0       1
4   3   20.0       1
5   3   10.0       1
6   3    NaN       1
realr
  • 3,652
  • 6
  • 23
  • 34
  • 1
    This would result in non 0/1 results if there were more than one NULL per group, right? – Fungui Jul 24 '19 at 22:22
  • @Fungui you were right, just corrected my response to return only 1/0. If you remove the `astype(int)` you can use the `bool` results too. – realr Jul 24 '19 at 22:29