1

I have a corpus created using tm package consisting of many documents. I want to use stringr function str_detect on my documents to see whether a document contains strings from another document. The output I want is lists of true/false on whether each document coincides with every other document in the corpus. Here's a sample of the code using the crude dataset from tm package:

library(tm)
library(stringr)
data("crude")
for (i in 1:length(crude)) {
text <- crude[[i]]
search <- str_detect(crude, text)
}

But in doing so, I get an error stating that the str_detect function is not applicable to plain text documents. So, what I want to do is to convert each document in the corpus into separate character vectors, so that the str_detect can work.

I tried doing:

chr.vector <- as.character(crude) 

It returns one character vector comprising everything in my corpus, which is not what I want. So I was considering to do a for loop, just that I have no idea how to display my output in a good way.

for (i in 1:length(crude)) {
x <- as.character(crude[[i]])

Can someone advise me on how to complete my code here? Or if there is a better way for me to approach this problem? Thanks!

Felicia
  • 11
  • 4
  • Please add what package this comes from and a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). What is the expected output? Will the `stringr` functions work on all the documents at once? If so, you might want to put everything into a list. – Roman Luštrik Jul 01 '16 at 07:21
  • @RomanLuštrik I have edited my code to, hopefully, make it more reproducible. Also included the output I want. What you do mean by putting everything into a list?? – Felicia Jul 01 '16 at 07:51
  • I think `x[i] <- as.character(corpus[[i]])` should work. – Ravi Jul 01 '16 at 08:05
  • @Ravi I tried using that in my for loop, but I get this warning message: number of items to replace is not a multiple of replacement length – Felicia Jul 01 '16 at 08:09
  • I tried it in my console, it's working fine – Ravi Jul 01 '16 at 08:12
  • You could try something like `chr.vector <- sapply(crude, content)` or `chr.vector <- sapply(crude, as, "character")`. – lukeA Jul 01 '16 at 08:50
  • @lukeA Thanks! This works, just that I have a warning message saying: longer object length is not a multiple of shorter object length. Will this be a problem? – Felicia Jul 04 '16 at 00:41
  • @Ravi Hmm.. I get an error saying that object 'x' is not found. Maybe I shall go explore this further. Thanks! – Felicia Jul 04 '16 at 00:43

0 Answers0