0

Image for the code:-

enter image description here

I m struggling updating the original dataframe when using a user defined function for running a task:

def filter_rows(df,col,string):

       df[col] = df[col].astype(str)
       df = df[~df[col].str.contains(string)]
       return df

When i run the above function on df1, it does return a trimmed down version. But that version is not updated in main df1.

 filter_rows(df1,'smoking_status','never smoked')

Meaning if i view df1 seperatly in the next cell, I still see the complete non trimmed dataset.

I've used inplace=True where possible but I cant seem to find a way to do that here.

Need a solution that could be used in other situations aswell.

Thanks in advance. :)

2 Answers2

0

Assign the dataframe the function returns back to the original dataframe.

df1 = filter_rows(df1,'smoking_status','never smoked')
norie
  • 9,609
  • 2
  • 11
  • 18
0

Reassigning df inside function changes reference to that object. To understand better about pass by reference refer to this SO answer.

To filter df without reassigning you can use df.drop

df.drop(df[df[col].str.contains(string)].index, inplace = True)

So your final function would be

def filter_rows(df,col,string):

       df[col] = df[col].astype(str)
       df.drop(df[df[col].str.contains(string)].index, inplace = True)
       return df
Raghava Dhanya
  • 959
  • 15
  • 22