0

I am trying to use tm() on a data frame with texts, but this error keeps on appearing: "Error in if (vectorized && (length <= 0)) stop("vectorized sources must have positive length") : missing value where TRUE/FALSE needed"

I have a data frame that looks like this:

     person sex adult                                 state code
1         sam   m     0         Computer is fun. Not too fun.   K1
2        greg   m     0               No it's not, it's dumb.   K2
3     teacher   m     1                    What should we do?   K3
4         sam   m     0                  You liar, it stinks!   K4
5        greg   m     0               I am telling the truth!   K5
6       sally   f     0                How can we be certain?   K6
7        greg   m     0                      There is no way.   K7
8         sam   m     0                       I distrust you.   K8
9       sally   f     0           What are you talking about?   K9
10 researcher   f     1         Shall we move on?  Good then.  K10
11       greg   m     0 I'm hungry.  Let's eat.  You already?  K11

I only use these codes:

library(tm)
texts <- as.data.frame(texts)
mycorpus<- Corpus(DataframeSource(texts))

Does anyone has an idea about what is going wrong here? Many thanks in advance!

rdatasculptor
  • 8,112
  • 14
  • 56
  • 81
  • 4
    Please add some [example data](http://stackoverflow.com/q/5963269/1036500) to your question, that will make it easier for people to help you. Try editing your question and pasting in `dput(head(texts))`. – Ben Apr 23 '13 at 22:31

2 Answers2

0

Hope this is the one you are looking for

xkcd.df <- read.csv(file.path(path, datafiles))
xkcd.corpus <- Corpus(DataframeSource(data.frame(xkcd.df[, 3])))
user3117837
  • 87
  • 1
  • 8
0

Sounds like you need to make a corpus our of your column of text (and it appears to be merged with the state code column, which you would need to separate if this is the case). Assuming that state code is the column that you want to use for the tm package then you should pull the column (not the entire data frame) into a corpus if I'm not mistaken. Using the info you provided, if you wanted to do so, your code should look something like this:

mycorpus<- Corpus(VectorSource(texts$state code))

If you do need to separate the text from the state code then assuming that "text" is your new column:

mycorpus<- Corpus(VectorSource(texts$text))
Robert
  • 510
  • 1
  • 5
  • 23