I have a corpus created using tm package consisting of many documents. I want to use stringr function str_detect on my documents to see whether a document contains strings from another document. The output I want is lists of true/false on whether each document coincides with every other document in the corpus. Here's a sample of the code using the crude dataset from tm package:
library(tm)
library(stringr)
data("crude")
for (i in 1:length(crude)) {
text <- crude[[i]]
search <- str_detect(crude, text)
}
But in doing so, I get an error stating that the str_detect function is not applicable to plain text documents. So, what I want to do is to convert each document in the corpus into separate character vectors, so that the str_detect can work.
I tried doing:
chr.vector <- as.character(crude)
It returns one character vector comprising everything in my corpus, which is not what I want. So I was considering to do a for loop, just that I have no idea how to display my output in a good way.
for (i in 1:length(crude)) {
x <- as.character(crude[[i]])
Can someone advise me on how to complete my code here? Or if there is a better way for me to approach this problem? Thanks!