0

I am a Statistician student using R. I tried to run a random forest model using the cforest function from the package party. I chose this one because my data has a lot of different type of variables (continuous, categoricals with different scales of measurement). The subtlety is that I have weighted data, meaning that each observation is weighted. A decimal vector named weight is present in my data.

First, to run the random forest model I use the weights option present in the cforest function. But the problem is when I try to compute variable importance with varimp. Even if I have results, an error message appears:" with non-unity weights might give misleading results".

PS: my outcome variable is continuous

Can you help me to correct this error please!!! Here is the minimum code to reproduce this error:

weight <- c(rep(0.3, 5), rep(1.2, 10), rep(2.5, 10), rep(0.9, 5)) # 30 observations 
a <- rbinom(30, 1, 0.5)
b <- rbinom(30, 1, 0.7)
c <- rbinom(30, 1, 0.6)
d <- rbinom(30, 1, 0.5)
e <- rnorm(30, mean =3)
level <- c(rep("low", 10), rep("medium", 5), rep("high", 15))
outcome <- rnorm(30, mean =10, sd =2)

data <- data.frame(weight, a, b, c, d, e, level = as.factor(level), outcome)

library(party)
res <- cforest(outcome ~. , data = data[,-1], weights = data$weight)
varimp(res)

Warning message: In varimp(res) : ‘varimp’ with non-unity weights might give misleading results

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Oumou S
  • 1
  • 1
  • 1
    Welcome to SO. For these kind of questions ("why do I get error XYZ and how can I fix it?") it is often critical to provide sample data and code to reproduce the error. You may also want to review how to provide a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). If you can't share your full data give a representative subsample of your full data or provide code to generate representative mock data. – Maurits Evers Feb 03 '19 at 07:36
  • I've edited your question to fix (some of) the language/spelling issues. Why undo those changes? "Subtility" is not an English word. – Maurits Evers Feb 03 '19 at 07:39
  • Thank you so much. I edited it sorry. I will see how I can reproduce you a minimal example with your link. Thanks – Oumou S Feb 03 '19 at 07:43

1 Answers1

0

Here the minimal code that reproduce the error. In this example i have the same error:

weight <- c(rep(0.3, 5), rep(1.2, 10), rep(2.5, 10), rep(0.9, 5)) # 30 observations 
a <- rbinom(30, 1, 0.5)
b <- rbinom(30, 1, 0.7)
c <- rbinom(30, 1, 0.6)
d <- rbinom(30, 1, 0.5)
e <- rnorm(30, mean =3)
level <- c(rep("low", 10), rep("medium", 5), rep("high", 15))
outcome <- rnorm(30, mean =10, sd =2)

data <- data.frame(weight, a, b, c, d, e, level = as.factor(level), outcome)

library(party)
res <- cforest(outcome ~. , data = data[,-1], weights = data$weight)
varimp(res)
Warning message:
In varimp(res) :
  ‘varimp’ with non-unity weights might give misleading results

Here, varimp gives 0 nut it's not the problem. Thank you for your help

Oumou S
  • 1
  • 1
  • Edit your question and add this to the question. Not as an answer. – NelsonGon Feb 03 '19 at 08:56
  • inital post already edited. Thanks for your help.How can I fix this error?? – Oumou S Feb 03 '19 at 09:25
  • 2
    @OumouS Well to start, it's a *warning* and not an *error*. An error would indicate a *failure* of the method while a *warning* suggests a critical interpretation/asssessment of results. – Maurits Evers Feb 03 '19 at 10:16
  • @MauritsEvers OK thanks for your explanations. Indeed, my final objective is to compute the variable importance of each predictor. Do you think that the results could be misleading because of this warning? Meaning that my results are wrong?? – Oumou S Feb 03 '19 at 10:50