0

I trained a Model with the following code:

set.seed(123)
xgbTree_model <- train(X_train,
                       y_train,
                       trControl = control,
                       method = "xgbTree",
                       metric = "RMSE",
                       preProcess = c("center","scale"),
                       importance = TRUE)

If I run this function:

varImp(xgbTree_model)

I am getting the following results:

> varImp(xgbTree_model)
xgbTree variable importance

  only 20 most important variables shown (out of 101)

                    Overall
OverallQual          100.00
GrLivArea             78.50
LotArea               30.31
TotalBsmtSF           27.49
Fireplaces            14.18
Age                    8.34
BsmtFinType1Unf        7.22
GarageYrBlt            5.73
CentralAirN            5.64
KitchenQualEx          5.42
KitchenQualTA          5.20
CentralAirY            4.20
BsmtQualTA             4.01
BsmtFinType1GLQ        3.84
NeighborhoodOldTown    1.96
Exterior1stBrkComm     1.88
BsmtFullBath           1.35
NeighborhoodIDOTRR     1.34
FoundationBrkTil       1.24
TotRmsAbvGrd           1.18
> 

I would like to perform a for loop to grab the first column of names to use it to delete the values of my existing table. I am trying to get rid of all the columns that are below the Overall value in the list. I tried to convert the list to a data.frame, but, I am losing the data that I need because this code adds its own column name when I convert, utilizing the following code:

corCol <- data.frame(matrix(unlist(l), nrow=length(l), byrow=T))

Is there a way in R for me to grab the left column from the varImp(xgbTree_model) function with a for loop?

Thank you for your support and recommendation.

Johnny
  • 819
  • 1
  • 10
  • 24
  • I'm not familiar with those functions, but you could try using the `colnames` function to get the column names. `colnames(varImp(xgbTree_model)`. – Mosquite Aug 20 '20 at 17:07
  • the colnames() function generates a null return. – Johnny Aug 20 '20 at 17:13
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Aug 20 '20 at 22:38

1 Answers1

1

the varimp object is a bit annoying since the 'first column' is actually rownames. This has caused confusion for me in the past.

You can put it into the data.frame with the tibble function rownames_to_column()

varimps <- varImp(xgbTree_model)$importance
varimps <- varimps %>% 
   tibble::rownames_to_column()

and then it is easy to extract or filter whatever you want

For example, if you want to extract all the columns with a score above 10:

varimpsKeep <- varimps %>% dplyr::filter(Overall>10)

or extract the top n variables as a character vector:

varimp <- varimp %>%
  dplyr::arrange(desc(Overall)) 
my_wanted_variables <- varimp$rowname[1:n]
Jagge
  • 938
  • 4
  • 21