I have Pandas dataframe with hundreds of categoric features (in numbers). I want to leave only top values in columns. I do already know, that there are only 3 or 4 most frequent values in each column, but I want to select it automatically. I need two ways to do it:
1)leave only 3 most frequent values. Notion: there are no columns with 1, 2 or 3 unique values (~20 unique values in each column), so, do not consider it. If you have, for example, several third places, leave them all. For example:
#after you use value_counts() column 1
1 35
2 23
3 10
4 9
8 8
6 8
#after you use value_counts() on column 2
0 23
2 15
1 15 #two second places
4 9
5 3
6 2
#result after you use value_counts() on column 1
1 35
2 23
3 10
others 25 #9+8+8
#result after you use value_counts() on column 2
0 23
2 15
1 15
4 9
others 5 #3+2
2)leave as many values in each column as needed so that the number of remaining values is less than the number of the last values that you decided to leave. For example:
#after you use value_counts() column 1
1 35
2 23
3 10
4 3
8 2
6 1
#after you use value_counts() on column 2
0 23
2 15
1 9
4 8
5 3
6 2
#result after you use value_counts() on column 1
1 35
2 23
3 10
others 6 #3+2+1
#result after you use value_counts() on column 2
0 23
2 15
1 9
4 8
others 5 #3+2
Please, do both. Thanks.