-1
library(randomForest)
set.seed(113, "L'Ecuyer")      
plot(randomForest(cmedv ~ .,  data = BostonHousing2,
        keep.forest = FALSE))

Error in randomForest.default(m, y, ...) : Can not handle categorical 
predictors with more than 53 categories.

Here is the str of my dataset.

str(BostonHousing2)

$ town   : Factor w/ 92 levels "Arlington","Ashland",..: 54 77 77 46 46 46 69 
        69 69 69 ...

$ tract  : int  2011 2021 2022 2031 2032 2033 2041 2042 2043 2044 ...

$ lon    : num  -71 -71 -70.9 -70.9 -70.9 ...

$ lat    : num  42.3 42.3 42.3 42.3 42.3 ...

$ medv   : num  24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

$ cmedv  : num  24 21.6 34.7 33.4 36.2 28.7 22.9 22.1 16.5 18.9 ...

$ crim   : num  0.00632 0.02731 0.02729 0.03237 0.06905 ...

$ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...

$ indus  : num  2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...

$ chas   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

$ nox    : num  0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...

$ rm     : num  6.58 6.42 7.18 7 7.15 ...

$ age    : num  65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...

$ dis    : num  4.09 4.97 4.97 6.06 6.06 ...

$ rad    : int  1 2 2 3 3 3 5 5 5 5 ...

$ tax    : int  296 242 242 222 222 222 311 311 311 311 ...

$ ptratio: num  15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...

$ b      : num  397 397 393 395 397 ...

$ lstat  : num  4.98 9.14 4.03 2.94 5.33 ...
Bernhard
  • 4,272
  • 1
  • 13
  • 23
  • It tells you the issue. The output shows that the issue is present. You don't ask any question. What do you want – Dason Apr 24 '18 at 13:09
  • 1
    The error is pretty self-explanatory; your predictor `town` has 92 levels, `randomForest` allows for maximally 53 levels. See [this post on Cross Validated](https://stats.stackexchange.com/questions/49243/rs-randomforest-can-not-handle-more-than-32-levels-what-is-workaround) for an extended discussion. – Maurits Evers Apr 24 '18 at 13:09
  • I remove town and medv columns from the data set then I used randomforest function – vidhi amin Apr 24 '18 at 15:07

1 Answers1

0

Random Forest has limitation of handling the more than 32 level of categorical value, so the way for ward is you can reduce the level of categorical value. for reducing categorical value you can use binning method, for example decile use ntile() in dplyr . it will reduce to lesser level.

arasif
  • 216
  • 2
  • 6