I am looking to create a variable (column) in my data frame that identifies suspected meaningless text (e.g. "asdkjhfas"), or the inverse. This is part of a general script that will assist my team with cleaning survey data.
A function I found on stackoverflow (link & credit below) allows me to match single words to a dictionary, it does not identify multiple words.
Is there any way I can do a partial match (rather than strict) with a dictionary?
library(qdapDictionaries) # install.packages(qdap)
is.word <- function(x) x %in% GradyAugmented
x <- c(1, 2, 3, 4, 5, 6)
y <- c("this is text", "word", "random", "Coca-cola", "this is meaningful
asdfasdf", "sadfsdf")
df <- data.frame(x,y)
df$z [is.word(df$y)] <- TRUE
df
In a perfect world I would get a column: df$z TRUE TRUE TRUE TRUE TRUE NA
My actual results are: df$z NA TRUE TRUE NA NA NA
I would be more than happy with: df$z TRUE TRUE TRUE NA TRUE NA
I found the function is.word here Remove meaningless words from corpus in R thanks to user parth