I am following this example to use lime on a supervised text model https://rdrr.io/github/thomasp85/lime/man/lime.html
I have just changed the get_matrix function to create the dtm. This new function works on the data in the example in this link, but not on my real data. I get this error:
Error in glmnet(x[, c(features, j), drop = FALSE], y, weights = weights, : x should be a matrix with 2 or more columns
The code I use is below - the data and analysis is just for this purpose, but replicates the error I am getting on my real data (where I have 1000 text documents instead of 10):
data<-data.frame(articles = c("Prince Harry proposed to Meghan", "Football transfer rumours Chelsea David Luiz", "Football transfer rumours Chelsea David Luiz",
"World Cup team by team guide", "Destiny free trial goes live today", "What happens today ahead of crucial vote",
"Story image for sport news football from BBC Sport", "Premier League news conferences", "What is Meghan Markles engagement ring", "Harry and Megan")
, topic = c("other", "sport", "sport", "sport", "other", "other", "sport", "sport", "other", "other"))
data$articles<-as.character(data$articles)
data$topic<-as.character(data$topic)
data_train<-data[1:6,]
data_test<-data[6:10,]
my_stop_word <-c (stopwords(), "one", "two", "three")
get_matrix <- function(text) {
it <- itoken(text, tolower, progressbar = FALSE)
vocab2 = create_vocabulary(it, stopwords = my_stop_word)
vectorizer = vocab_vectorizer(vocab2)
create_dtm(it, vectorizer = vectorizer)
}
dtm_train = get_matrix(data_train$articles)
xgb_model <- xgb.train(list(max_depth = 7, eta = 0.1, objective = "binary:logistic",
eval_metric = "error", nthread = 1),
xgb.DMatrix(dtm_train, label = data_train$topic == "sport"),
nrounds = 50)
sentences <- head(data_test[data_test$topic == "sport", "articles"], 1)
explainer <- lime(data_test$articles, xgb_model, get_matrix)
explanations <- explain(sentences, explainer, n_labels = 1, n_features = 2)
Error: Error in glmnet(x[, c(features, j), drop = FALSE], y, weights = weights, : x should be a matrix with 2 or more columns
Thank you!