0

When using the stemDocument function from the tm (text mining) R package the word "already" is converted to "alreadi"

for example:

I am analyzing a number of tweets in a corpus document.

One of the tweets show the following prior to executing the command:

inspect(myCorpus[98])
<<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>>

[[1]]
<<PlainTextDocument (metadata: 7)>>
select   member  jeffroky  attending sqlsat   true  already eventdt httptcoquyndcgs sqlpass

After executing the following line of code:

myCorpus <- tm_map(myCorpus, stemDocument, language = "english")>
inspect(myCorpus[98])

I obtain the following result:

[[1]] 
PlainTextDocument (metadata: 7) 
select   member  jeffroki  attend sqlsat   true alreadi eventdt   httptcoquyndcg sqlpass

Please note the change in the word "already" to "alreadi" Can someone shed some light regarding this behaviour?

Thanks! Luis

lawyeR
  • 7,488
  • 5
  • 33
  • 63

1 Answers1

0

You need to use a stem Completion function. Try

stemCompletion("alreadi", dictionary = myCorpus)

Refer to this post https://stackoverflow.com/a/25391686/2748373

Community
  • 1
  • 1