Pandas notnull is not working on column in data frame

Question

A follow-up question from Combine Pandas data frame column values into new column

I have successfully combined a series of ID's into one field and now I need to filter out any rows that did not end up with a Combined ID value. Usually I would use notnull but on this column it is not working. Can anyone fill me in on the problem? Thanks!

df_merged['Combined_ID']  = df_merged[['ID1','ID2','ID3','ID4','ID5']].apply(lambda x : ''.join([e for e in x if isinstance(e, basestring)]), axis=1)

#Remove any rows that do not have an ID in the new field
#This is not removing the rows that do not have a combined ID value
df_merged = df_merged[pd.notnull(df_merged['Combined_ID'])]

Are you sure that the missing row values really are null? your code should've worked but series have a `notnull()` method available, could you try: `df_merged = df_merged[df_merged['Combined_ID'].notnull()]` — EdChum, May 07 '15 at 20:36

score 2 · Accepted Answer · answered May 07 '15 at 20:54

2

This column is never going to be null. If every item in the row is not a basestring then the function returns ''.

Therefore the following should work:

df_merged = df_merged[df_merged['Combined_ID'] != '']

answered May 07 '15 at 20:54

Andy Hayden

359,921
101
625
535

Ah, yes, you are correct Andy and Ed, as I was starting to assume. Now I can see that on the output as well (I was writing to a file where it still appeared blank). The != '' works perfectly. Thanks very much! – EMC May 07 '15 at 21:06

Pandas notnull is not working on column in data frame

1 Answers1