Basically I have my bag of words:
source <- VectorSource(text)
corpus <- Corpus(source)
corpus <- tm_map(corpus, content_transformer(tolower))
dtm <- DocumentTermMatrix(cleanset)
etc etc.
And I have a data frame consisting or just two columns which I called up from a SQLite DB. Column 1 is a list of hundreds of words, and Column 2 is each word's corresponding Part of Speech code.
I am trying to match every token in my dtm to the identical term in column 1 of the dataframe, so that each token then can be matched its corresponding POS code. Essentially, the dataframe is like a dictionary, and I want to match each token in my dtm to its definition.
I tried a bunch of GREP functions to do this, but to no avail. Anyone have thoughts on the best way to approach this?
Thanks!