1

I'm building a classification algorithm for a text analytics project. All seems well as I'm building the models, but when I use the predict function I get this error I do not understand. Below is my code.

#Load packages
library(caret)
library(tm)
library(SnowballC)
library(stringr)

#Training data.
data.t <- data[1:60]
data.t <- data.t[c(24,25,26)]
data.t[] <- lapply(data.t, str_trim)
is.na(data.t) <- data.t==''
data.t <- na.omit(data.t)
corpus <- VCorpus(VectorSource(data.t$PurpCk1))

##Create a text document matrix.
tdm <- DocumentTermMatrix(corpus, list(removePunctuation = TRUE, stripWhitespace = TRUE, content_transformer(tolower), stopwords = TRUE, stemming = FALSE, removeNumbers = FALSE))
##Convert to a data.frame for training and assign a classification (factor) to each document.
train <- as.matrix(tdm)
#Create condition code that differentiates individuals from conditions.
cond.code <- as.data.frame(data.t$Condition)
cond.code <- as.data.frame(ifelse(cond.code$`data.t$Condition`== "CSManipulation", 1,0))
train <- cbind(train, cond.code)
colnames(train)[ncol(train)] <- 'y'
train <- as.data.frame(train)
train$y <- as.factor(train$y)

##Train.
require(foreach)
registerDoSEQ()
#Training control prevents model over-fitting. 
tc <- trainControl(method = "cv", number = 7, verboseIter=FALSE , preProcOptions="pca", allowParallel=TRUE)
bayesglm <- train(y ~ ., data = train, method = 'bayesglm', trControl=tc)
rf <- train(y ~ ., data = train, method = 'rf', trControl=tc)
NN <- train(y ~ ., data = train, method = 'nnet', trControl=tc, verbose=FALSE)
svml <- train(y ~ ., data = train, method = 'svmLinear', trControl=tc)
logitboost <- train(y ~ ., data = train, method = 'LogitBoost', trControl=tc)

#This is used to complare the models against one anoher.
model <- c("Bayes GLM", "Neural Net", "SVM (linear)", "LogitBoost")
Accuracy <- c(max(bayesglm$results$Accuracy),
          max(NN$results$Accuracy),
          max(svml$results$Accuracy),
          max(logitboost$results$Accuracy))
Kappa <- c(max(bayesglm$results$Kappa),
       max(NN$results$Kappa),
       max(svml$results$Kappa),
       max(logitboost$results$Kappa))
performance <- cbind(model,Accuracy,Kappa)
knitr::kable(performance)
#Create new dataset and fit models to new data.
data.n <- data[1:60]
data.n <- data.n[c(24,25,26)]
data.n <- rbind(data.t,data.n)
data.n[] <- lapply(data.n, str_trim)
is.na(data.n) <- data.n==''
data.n <- na.omit(data.n)
data.n <- unique(data.n)
data.n <- data.n$PurpCk1
data.n <- as.data.frame(data.n)

This is the portion of code that returns the error.

#Fit model to new data
predict(svml,data.n)
Error in eval(predvars, data, env): object 'active' not found.
  • 1
    Is there a column named "active" in your `train` data.frame? You are expected to provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can run and test the code. Also, make the code you share as minimal as possible. Remove anything not directly related to your question. – MrFlick Jul 13 '17 at 17:40
  • I think your comments helped me to figure this out – though I'm still not out of the woods yet. I think the problem is with my corpus. The corpus that I'm applying to new data may not have all the words to make the appropriate classifications if that makes sense. So when I fit my model to new data, there are words that were not in the original corpus that cannot be analyzed, which results in the error being generated. – Chris Castille Jul 13 '17 at 17:55
  • 1
    Hi and welcome to Stack Overflow, please take a time to go through the [welcome tour](https://stackoverflow.com/tour) to know your way around here (and also to earn your first badge), read how to create a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) and also check [How to Ask Good Questions](https://stackoverflow.com/help/how-to-ask) so you increase your chances to get feedback and useful answers. – DarkCygnus Jul 13 '17 at 18:21

0 Answers0