I'm using the tm package to run LDA on my corpus. I have a corpus containing 10,000 documents.
rtcorpus.4star <- Corpus(DataframeSource(rt.subset.4star)) ##creates the corpus
rtcorpus.4star[[1]] ##accesses the first document
I'm trying to write a piece of code that will add the word "specialword" after certain words. So essentially: for a vector of words (good, nice, happy, fun, love) that I choose, I want to the code to loop through each document, and add the word "specialword" after any of these words.
So for example, given this document:
I had a really fun time
I want the result to be this:
I had a really fun specialword time
The issue is that I'm not sure how to do this because I don't know how to get the code to read within the corpus. I know I should do a for loop (or maybe not), but I'm not sure how to loop through each word in each document, and each document in the corpus. I'm also wondering if I can use something along the lines of a "translate" function that works in tm_map.
Edit::
Made some attempts. This codes returns "test" as NA. Do you know why?
special <- c("poor", "lose")
for (i in special){
test <- gsub(special[i], paste(special[i], "specialword"), rtcorpus.1star[[1]])
}
Edit: figured it out!! thanks
special <- c("poor", "lose")
for (i in 1:length(special)){
rtcorpus.codewordtest <-gsub(special[i], paste(special[i], "specialword"), rtcorpus.codewordtest)
}