I have pairs of customer feedback data in a CSV, denoting whether the customer recommended the service they received (1 or 0), "rec", and an associated comment, "comment". I am trying to compare the customer feedback between those who recommended the service and those who did not.
I have used the tm package to simply read all the lines in a CSV with only comments and do some follow-on text-mining on all the comments, which worked:
>file_loc <- "C:/Users/..(etc)...file.csv"
x <- read.csv(file_loc, header = TRUE)
require(tm)
fdbk <- Corpus(DataframeSource(x))
Now I am trying to compare the comments of those customers who recommend and those who do not by including the "rec" column, but I have not been able to create a corpus from a single column CSV - I tried the following:
>file_loc <- "C:/Users/..(etc)...file.csv"
x <- read.csv(file_loc, header = TRUE)
require(tm)
fdbk <- Corpus(DataframeSource(x$comment))
But I get an error saying
"Error in if (vectorized && (length <= 0))
stop("vectorized sources must have positive length") :
missing value where TRUE/FALSE needed"
I also tried binding the "rec" codes to the comments after creating a topic model, but certain comments end up getting filtered by the "topic" function so the "rec" column is longer than the # of documents in the resulting topic model.
If this something I can do with the tm package simply? I haven't worked with the qdap package at all but is that something that is more appropriate here?