I´m dealing with a text mining task. Today, I have a problem with the stemming method. I have several paragraphs in this format. These are character object, not list neither Corpus object from tm package.
[1] " andres oppenheimer intelectuales influyentes latinoamerica segun revista foreign policy editor columnista miami herald sigue recorriendo continente presenta reportajes cnn tradicional ciclo periodistico argentina presentando libro salvese pueda analiza futuro mundo automatizacion robotizacion "
I have a dictionary where some words has to be match in the corpus above. The problem is that I couldn´t do it through the stemming method. My syntax is the following:
lexicon<- read.xlsx("lexicon nf.xlsx",sheetName = "lex",colIndex = 1,header = T)
lexicon$palabra<- as.character(lexicon$palabra)
match<- paste(lexicon$palabra[order(-nchar(lexicon$palabra))],collapse = "|^")
If I try:
match<- paste(lexicon$palabra[order(-nchar(lexicon$palabra))],collapse = "|")
It matches the word in any position, but this is not what I want. I know that if a split the words of the corpus by, for instance the space, I can make the match as I need, but this is a more complicated aproach. I wish to do it directly from the paragraph, But without turn it into a Corpus object.
Any idea? Thank you very much for your help!