I am a Statistician student using R. I tried to run a random forest model using the cforest
function from the package party
. I chose this one because my data has a lot of different type of variables (continuous, categoricals with different scales of measurement). The subtlety is that I have weighted data, meaning that each observation is weighted. A decimal vector named weight
is present in my data.
First, to run the random forest model I use the weights option present in the cforest
function. But the problem is when I try to compute variable importance with varimp
. Even if I have results, an error message appears:" with non-unity weights might give misleading results".
PS: my outcome variable is continuous
Can you help me to correct this error please!!! Here is the minimum code to reproduce this error:
weight <- c(rep(0.3, 5), rep(1.2, 10), rep(2.5, 10), rep(0.9, 5)) # 30 observations
a <- rbinom(30, 1, 0.5)
b <- rbinom(30, 1, 0.7)
c <- rbinom(30, 1, 0.6)
d <- rbinom(30, 1, 0.5)
e <- rnorm(30, mean =3)
level <- c(rep("low", 10), rep("medium", 5), rep("high", 15))
outcome <- rnorm(30, mean =10, sd =2)
data <- data.frame(weight, a, b, c, d, e, level = as.factor(level), outcome)
library(party)
res <- cforest(outcome ~. , data = data[,-1], weights = data$weight)
varimp(res)
Warning message: In varimp(res) : ‘varimp’ with non-unity weights might give misleading results