I need to remove all non-English words from a data frame that looks like this:
ID text
1 they all went to the store bonkobuns and bought chicken
2 if we believe no exomunch standards are in order then we're ok
3 living among the calipodians seems reasonable
4 given the state of all relimited editions we should be fine
I want to end with a data frame as such:
ID text
1 they all went to the store and bought chicken
2 if we believe no standards are in order then we're ok
3 living among the seems reasonable
4 given the state of all editions we should be fine
I have a vector containing all english words: word_vec
I can remove all words that are in a vector from a data frame using the tm package
for(k in 1:nrow(frame){
for(i in 1:length(word_vec)){
frame[k,] <- removeWords(frame[i,],word_vec[i])
}
}
but I want to do the opposite. I want to 'keep' only the words found in the vector.