3

In a regression problem, I have many categorical predictors (factors). Many of these factors also have a large number of levels (2000 levels for one of these variables). Since a regression with such a variable as a predictor will be too highly parametrized, I was wondering if there is some way of collapsing the many rare levels of such a variable into an "other" level.

I could use the factor function in R, for example:

newx <- factor(oldx, levels=c(1,2,3,rep(4,1996)))

for all the variables, where the levels will be preserved for the more common levels and mapped to "other" for the less common levels (looking at table(oldx)). However, I was wondering if there are standard ways of doing this in R already. Also, are there other things one has to be careful about?

Thanks

lovalery
  • 4,524
  • 3
  • 14
  • 28

0 Answers0