1

A follow-up question from Combine Pandas data frame column values into new column

I have successfully combined a series of ID's into one field and now I need to filter out any rows that did not end up with a Combined ID value. Usually I would use notnull but on this column it is not working. Can anyone fill me in on the problem? Thanks!

df_merged['Combined_ID']  = df_merged[['ID1','ID2','ID3','ID4','ID5']].apply(lambda x : ''.join([e for e in x if isinstance(e, basestring)]), axis=1)

#Remove any rows that do not have an ID in the new field
#This is not removing the rows that do not have a combined ID value
df_merged = df_merged[pd.notnull(df_merged['Combined_ID'])]
Community
  • 1
  • 1
EMC
  • 699
  • 1
  • 9
  • 16
  • Are you sure that the missing row values really are null? your code should've worked but series have a `notnull()` method available, could you try: `df_merged = df_merged[df_merged['Combined_ID'].notnull()]` – EdChum May 07 '15 at 20:36

1 Answers1

2

This column is never going to be null. If every item in the row is not a basestring then the function returns ''.

Therefore the following should work:

df_merged = df_merged[df_merged['Combined_ID'] != '']
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • Ah, yes, you are correct Andy and Ed, as I was starting to assume. Now I can see that on the output as well (I was writing to a file where it still appeared blank). The != '' works perfectly. Thanks very much! – EMC May 07 '15 at 21:06