Convert each document in corpus into a separate character vector

Question

I have a corpus created using tm package consisting of many documents. I want to use stringr function str_detect on my documents to see whether a document contains strings from another document. The output I want is lists of true/false on whether each document coincides with every other document in the corpus. Here's a sample of the code using the crude dataset from tm package:

library(tm)
library(stringr)
data("crude")
for (i in 1:length(crude)) {
text <- crude[[i]]
search <- str_detect(crude, text)
}

But in doing so, I get an error stating that the str_detect function is not applicable to plain text documents. So, what I want to do is to convert each document in the corpus into separate character vectors, so that the str_detect can work.

I tried doing:

chr.vector <- as.character(crude)

It returns one character vector comprising everything in my corpus, which is not what I want. So I was considering to do a for loop, just that I have no idea how to display my output in a good way.

for (i in 1:length(crude)) {
x <- as.character(crude[[i]])

Can someone advise me on how to complete my code here? Or if there is a better way for me to approach this problem? Thanks!

Please add what package this comes from and a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). What is the expected output? Will the `stringr` functions work on all the documents at once? If so, you might want to put everything into a list. — Roman Luštrik, Jul 01 '16 at 07:21
@RomanLuštrik I have edited my code to, hopefully, make it more reproducible. Also included the output I want. What you do mean by putting everything into a list?? — Felicia, Jul 01 '16 at 07:51
@Ravi I tried using that in my for loop, but I get this warning message: number of items to replace is not a multiple of replacement length — Felicia, Jul 01 '16 at 08:09
You could try something like `chr.vector <- sapply(crude, content)` or `chr.vector <- sapply(crude, as, "character")`. — lukeA, Jul 01 '16 at 08:50
@lukeA Thanks! This works, just that I have a warning message saying: longer object length is not a multiple of shorter object length. Will this be a problem? — Felicia, Jul 04 '16 at 00:41
@Ravi Hmm.. I get an error saying that object 'x' is not found. Maybe I shall go explore this further. Thanks! — Felicia, Jul 04 '16 at 00:43

Convert each document in corpus into a separate character vector

0 Answers0