This question is related to How can I replace a factor levels with the top n levels (by some metric), plus [other]?. As a metric I want to use the number of occurrences of the factor. I know I can do it by making a list of the occurrences, but I was wondering if there is a prettier way.
Example:
library(data.table);
library(plyr);
fac <- data.table(score = as.factor(c(3,4,5,3,3,3,5)));
ocCnt <- data.table(lapply(fac,count)$score);
fac$occurrence <- 0;
for(i in 1:length(fac$score)){fac$occurrence[i]<-ocCnt[x==fac$score[i]]$freq};
Then I could use the function described in the referenced question/answer:
hotfactor= function(fac,by,n=10,o="other") {
levels(fac)[rank(-xtabs(by~fac))[levels(fac)]>n] <- o
fac
}
To continue the example, if we want only to see the most popular factor we do:
hotfactor(fac$score,fac$occurrence,1);
To get the answer:
[1] 3 other other 3 3 3 other
Levels: 3 other
So my question is, can I do this without having to add a list which counts the occurrences?
Note that I want to do this for the n most popular factors (not just for the most popular factor).