0

I am confused with the different results that I obtain from to functions used with RandomForest package in R to assess variables importance.

My model is defined as :

model <- randomForest(nee_m ~ ., data = nee, ntree = 200, mtry = 7, importance=TRUE)

nee_m is the main variable, and the 8 explanatory variables are sw_in, ta, vpd, rew, rh, pluie, vent, co2.

Here are my results: (1) using varImp(model), the result for %IncMSE is:

      Overall 

ta 118.08770

rh 71.48408

vpd 62.24601

pluie 23.28636

vent 151.23066

sw_in 886.14511

co2 208.20772

rew 305.57892

(2) using model$importance[order(model$importance[, 1], decreasing = TRUE), ]: the result is:

        %IncMSE IncNodePurity 

sw_in 43.12005718 1451599.722

vpd 4.70746641 201849.024

rew 4.16280001 189716.854

ta 4.02571339 121612.437

rh 2.73049849 102672.109

co2 1.37747947 81391.062

vent 0.57235041 61368.274

pluie 0.02396995 2851.669

If I put (1) in the decreasing order as in (2) and calculate the relative value: I get (3):

      Overall 

sw_in 48.522222 %

rew 16.732438 %

co2 11.400730 %

vent 8.2808645 %

ta 6.4660714 %

rh 3.914219 %

vpd 3.408375 %

pluie 1.275080 %

Values and order of importance between (2) and (3) are different, which results should I trust and why are they different? I may miss somthing in the understanding of the results.

Thanks in advance for helping

EDIT: Very strange result: if I set: varImp(model, scale=FALSE), I get excatly the same result as (2) in terms of values and order! so a value between 0 and 100, which is the opposite of what is explained in the documentation: i.e. if you set scale=FALSE, the value should not be between 0 and 100, but only if you set default varImp(model) (or varImp(model, scale=TRUE), then it is supposed to be between 0 and 100... This is not what I have in (3)... So, which results make sense then? I am really confused...

Community
  • 1
  • 1
virginie
  • 35
  • 7
  • https://stackoverflow.com/questions/37888619/difference-between-varimp-caret-and-importance-randomforest-for-random-fores – user2974951 Jan 22 '19 at 12:40
  • Yes I saw, that thank you, but their RF is based on classification (nodepurity) and note regression, I don't really see any clear explanation, or at least, it is still not clear too me...I am working on it – virginie Jan 22 '19 at 13:39
  • Do you mean varImpPlot()? There is not varImp() function in the randomForest package. – Scholar Jan 23 '19 at 13:16
  • True there is not, but you can use varImp() if the object you put inside is a random forest model from what I understand. Actually I downloaded "caret" package, and then I can use varImp() on my random forest model saved as model, which was built with the randomForest function (randomForest package indeed) – virginie Jan 23 '19 at 13:37

0 Answers0