3

I have a dateframe with a lot of rows with some low frequency values. I need to do column wise counts and then change the value for when the frequency is less than 3.

DF-Input

Col1     Col2     Col3       Col4
 1        apple    tomato     apple
 1        apple    potato     nan
 1        apple    tomato     banana
 1        apple    tomato     banana
 1        apple    tomato     banana
 1        apple    tomato     banana
 1        grape    tomato     banana
 1        pear     tomato     banana
 1        lemon    tomato     burger

DF-Output

Col1     Col2     Col3       Col4
 1        apple    tomato     Other
 1        apple    Other      nan
 1        apple    tomato     banana
 1        apple    tomato     banana
 1        apple    tomato     banana
 1        apple    tomato     banana
 1        Other    tomato     banana
 1        Other    tomato     banana
 1        Other    tomato     Other
aiden rosenblatt
  • 403
  • 2
  • 5
  • 9
  • I don't think Ayhan's original solution handles the NaN in the original dataframe for this question. I suggest this isn't a dupe. – Scott Boston Jan 30 '18 at 22:36

1 Answers1

5

You use where with value_counts:

df.where(df.apply(lambda x: x.groupby(x).transform('count')>2), 'Other')

Output:

       Col2    Col3    Col4
Col1                       
1     apple  tomato   Other
1     apple   Other  banana
1     apple  tomato  banana
1     apple  tomato  banana
1     apple  tomato  banana
1     apple  tomato  banana
1     Other  tomato  banana
1     Other  tomato  banana
1     Other  tomato   Other

Update: To handle NaN in original dataframe:

d = df.apply(lambda x: x.groupby(x).transform('count'))
df.where(d.gt(2.0).where(d.notnull()).astype(bool), 'Other')

Output:

       Col2    Col3    Col4
Col1                       
1     apple  tomato   Other
1     apple   Other     NaN
1     apple  tomato  banana
1     apple  tomato  banana
1     apple  tomato  banana
1     apple  tomato  banana
1     Other  tomato  banana
1     Other  tomato  banana
1     Other  tomato   Other
Scott Boston
  • 147,308
  • 15
  • 139
  • 187