1

I would like to use the wordnet lemmatizer to lemmatize the words in a

> a<-c("He saw a see-saw on a sea shore", "she is feeling cold")
> a
[1] "He saw a see-saw on a sea shore" "she is feeling cold"  

I convert a into a corpus and do pre-processing steps (like stopword removal, lemmatization etc)

> a <- Corpus(VectorSource(a))

I wanted to do the lemmatization in the below way,

> filter <- getTermFilter("ExactMatchFilter", a, TRUE)
> terms <- getIndexTerms("NOUN", 1, filter)
> sapply(terms, getLemma)

but I get this error

> filter <- getTermFilter("ExactMatchFilter", a, TRUE)
Error in .jnew(paste("com.nexagis.jawbone.filter", type, sep = "."), word,  : 
  java.lang.NoSuchMethodError: <init>

My idea is to lemmatize the whole corpus and not a single word, How can it be accomplished?

agstudy
  • 119,832
  • 17
  • 199
  • 261
user1946217
  • 1,733
  • 6
  • 31
  • 40
  • Not entirely sure about using R for interacting with WordNet or any NLP facility, but what I'd do here is use rpy to accomplish the R business and use NLTK for the WordNet/lemmatization stuff. Granted this works UNLESS your code HAS to be in R for some reason. – dmn Feb 25 '13 at 20:27

1 Answers1

4

Put you code in a loop, you can try something like this:

       lapply(a,function(x){
            x.filter <- getTermFilter("ExactMatchFilter", x, TRUE))
            terms <- getIndexTerms("NOUN", 1, x.filter)
            sapply(terms, getLemma)
         })
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • For some reason this code with OP's data gives me list with two elements that are empty lists. What am I missing ? – expert May 05 '17 at 15:32