I have a dataframe, one of the columns is categorical in nature and many of those values in this column are repeating, however there are many values that have very small count i.e. one digit count whereas other values have count in 3 or 4 digits. I want to replace all the values in this categorical column that have value count of less than 10 with value 'other'. Below, I am trying to mimic my question here with an example dataframe.
Code for example dataframe is as follows :
l1,l2 = [1,2,3,4,5, 6 ,7, 8,9,10], ['aa', 'bb', 'aa', 'bb', 'bb', 'ee', 'bb','gg','gg','gg']
dataframe = pd.DataFrame(zip(l1,l2), columns =['l1','l2'])
dataframe.l2.value_counts()
l1 | l2 |
---|---|
1 | 'aa' |
2 | 'bb' |
3 | 'aa' |
4 | 'bb' |
5 | 'bb' |
6 | 'ee' |
7 | 'bb' |
8 | 'gg' |
9 | 'gg' |
10 | 'gg' |
Now if I print value_counts() for column 'l2' I will get count of every value in column 'l2'.
dataframe.l2.value_counts()
My question is, how to replace all those values in this 'l2' column which have value count <3 with value 'other' My expected dataframe is :
l1 | l2 |
---|---|
1 | 'other' |
2 | 'bb' |
3 | 'other' |
4 | 'bb' |
5 | 'bb' |
6 | 'other' |
7 | 'bb' |
8 | 'gg' |
9 | 'gg' |
10 | 'gg' |
Here as you can see, all instances of values 'aa' and 'ee' are replaced with 'other' as their value count was less than 3. How to do this ?