0

I have written the following function:

def drop_low_counts(df, column):
    for value in df[column]:
        if df[column].value_counts()[str(value)] < 5 == True:
            df.drop(df[df[column] == str(value)].index, axis=0)
            return df

It is supposed to drop the rows of a df where the value in a certain column is repeated less than 5 times in the column. I am not sure where I have gone wrong, but when I run the function the df does not change at all. I have been running it as a regular function, but maybe I need to use df.apply()? Thank you for the help!

1 Answers1

0

The return statement is inside for, so the function return the dataframe after processing the first value in the column. Try to move the return statement outside of the loop so that the function returns the df after all values have been processed.

Also you should assign the result of df drop back to df so that the changes are saved to the original dataframe.

def drop_low_counts(df, column):
    for value in df[column]:
        if df[column].value_counts()[str(value)] < 5:
            df = df.drop(df[df[column] == str(value)].index, axis=0)
    return df

InvstIA
  • 84
  • 4