0

I have a scenario where I have a parent dataframe stored somewhere, and based on requirements, I only want to process some rows of that dataframe, however, I just want that dataframe-subset to be circulated for processing, rather than the whole dataframe, because df.loc[mask, col] = value operations are just messier to deal with in generic code. The problem is that the operation performed on the subset is not updated in the original dataframe. For example:

def get_subset(row_indices):
   return self.dataframe.loc[self.dataframe[index_column].isin(row_indices)]

sub_df = get_subset([1,2,3])
sub_df['text'] = sub_df['text'].str.lower()

This code snippet updates text column in sub_df, but not in the main dataframe inside the class. Is there a way to mask a dataframe such that these operations are inplace? Thanks!

Ramsha Siddiqui
  • 460
  • 6
  • 20
  • Is ```get_subset()``` a class method? If not, where are ```self```, ```dataframe```, and ```index_column``` defined? – LTheriault Jun 18 '20 at 16:59
  • [This](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) might help – davidbilla Jun 18 '20 at 17:30

0 Answers0