1

I have pairs of customer feedback data in a CSV, denoting whether the customer recommended the service they received (1 or 0), "rec", and an associated comment, "comment". I am trying to compare the customer feedback between those who recommended the service and those who did not.

I have used the tm package to simply read all the lines in a CSV with only comments and do some follow-on text-mining on all the comments, which worked:

>file_loc <- "C:/Users/..(etc)...file.csv"    
x <- read.csv(file_loc, header = TRUE)   
require(tm)   
fdbk <- Corpus(DataframeSource(x))

Now I am trying to compare the comments of those customers who recommend and those who do not by including the "rec" column, but I have not been able to create a corpus from a single column CSV - I tried the following:

>file_loc <- "C:/Users/..(etc)...file.csv"    
x <- read.csv(file_loc, header = TRUE)   
require(tm)   
fdbk <- Corpus(DataframeSource(x$comment))

But I get an error saying

"Error in if (vectorized && (length <= 0))
stop("vectorized sources must have positive length") : 
missing value where TRUE/FALSE needed"

I also tried binding the "rec" codes to the comments after creating a topic model, but certain comments end up getting filtered by the "topic" function so the "rec" column is longer than the # of documents in the resulting topic model.

If this something I can do with the tm package simply? I haven't worked with the qdap package at all but is that something that is more appropriate here?

Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
user2407054
  • 113
  • 1
  • 1
  • 4
  • What happens if you try `VectorSource` instead of `DataframeSource`? – Ben Aug 05 '13 at 15:14
  • VectorSource will read-in the single column, but I still have the same problem of my final list of documents after the "topics" function being shorter than the originally assoicated "rec" column, i.e. I get the message "Your group membership length doesn't match omega." – user2407054 Aug 05 '13 at 15:30
  • Can you post your data or some example data? And clarify exactly what you mean by 'compare'? What exactly do you want to do here? – Ben Aug 05 '13 at 15:36
  • The data is two columns in a CSV that look like the following: – user2407054 Aug 05 '13 at 15:38
  • Edit your Question and paste in `head(dput(x))` – Ben Aug 05 '13 at 15:46
  • The data is two adjcent columns in a CSV, one column "rec" which is just one digit per cell, for ex. 1,0,1,0,0,1,1,1, etc. and the second column "comment" is simply one short phrase per cell, for ex. "Great experience", "I had trouble finding the location", etc – user2407054 Aug 05 '13 at 15:48
  • So what exactly is the question here? You mention something about a topic model... what exactly do you want to with the `comments$rec` and your topic model. Consider deleting this question and asking another, more focused and specific question that has a [reproducible example](http://stackoverflow.com/q/5963269) – Ben Aug 05 '13 at 16:53
  • Sorry for not explaining very well. For example, I would want to be able to color-code my topic-model by whether the customer recommended service or not, to visualize the difference in topics/trends between these two groups, but I can't do this unless I have the 0/1 "recommend" coding paired with each comment throughout the process. That said, I can ask another question showing this all more clearly – user2407054 Aug 06 '13 at 19:44
  • Please do! I'll keep an eye out for it. The more effort and specific detail you include in your question, the quicker and more helpful the answers will be. – Ben Aug 06 '13 at 20:06

1 Answers1

0

... as ben mentioned:

vec <- as.character(x[,"place of comments"])
Corpus(VectorSource(vec))

perhaps some customer id as meta data would be nice...

hth

holzben
  • 1,459
  • 16
  • 24