I would like to compare models (multiple regression, LASSO, Ridge, GBM) in terms of the importance of variables. But I'm not sure if the procedure is correct, because the values obtained are not on the same scale.
In multiple regression and GBM values range from 0 - 100 using varImp from the caret package. The calculation of this statistic is distinct in each of the methods.
Linear Models: the absolute value of the t-statistic for each model parameter is used.
Boosted Trees: this method uses the same approach as a single tree, but sums the importance of each boosting iteration.
While for LASSO and Ridge the values are from 0.00 - 0.99, calculated with the function:
varImp <- function (object, lambda = NULL, ...) {
beta <- predict (object, s = lambda, type = "coef")
if (is.list (beta)) {
out <- do.call ("cbind", lapply (beta, function (x)
x [, 1])))
out <- as.data.frame (out)
} else
out <- data.frame (Overall = beta [, 1])
out <- abs (out [rownames (out)! = "(Intercept)",, drop = FALSE])
out
}
Which was obtained here: Caret package - glmnet variable importance
I was guided by other questions on the forum, but could not understand why there is the difference between the scales. How can I make these measurements comparable?