I have a scenario where I have a parent dataframe stored somewhere, and based on requirements, I only want to process some rows of that dataframe, however, I just want that dataframe-subset to be circulated for processing, rather than the whole dataframe, because df.loc[mask, col] = value operations are just messier to deal with in generic code. The problem is that the operation performed on the subset is not updated in the original dataframe. For example:
def get_subset(row_indices):
return self.dataframe.loc[self.dataframe[index_column].isin(row_indices)]
sub_df = get_subset([1,2,3])
sub_df['text'] = sub_df['text'].str.lower()
This code snippet updates text column in sub_df, but not in the main dataframe inside the class. Is there a way to mask a dataframe such that these operations are inplace? Thanks!