I have a Pandas DataFrame with some categorical data in one of the columns. On doing value_counts
on that particular column, I get something similar to:
HR 176
Coding 81
Reject 74
Database Administration 21
Finance 17
Project Management 16
Sales 15
DevOps 13
Core Electronics 10
Networking 10
Medical Science 9
Core Mechanical 8
Web Development 4
Puzzles 3
behavioural 3
not a question 2
civil engineering 1
Mathematics 1
Finance, Medical Science 1
Sales, HR 1
What I'd like to do is to only keep the categories with a count >= some threshold (e.g. 10). All the smaller categories should get clubbed in a separate "Other" category i.e. the result should look like:
HR 176
Coding 81
Reject 74
*Other* 33
Database Administration 21
Finance 17
Project Management 16
Sales 15
DevOps 13
Core Electronics 10
Networking 10
I've done this in the past by hacking together a defaultdict(int)
and only taking the instances where count >= threshold. I want to know if there is a Pandas canonical way of achieving the same.