I am doing a lot of analysis with the TM
package. One of my biggest problems are related to stemming and stemming-like transformations.
Let's say I have several accounting related terms (I am aware of the spelling issues).
After stemming we have:
accounts -> account
account -> account
accounting -> account
acounting -> acount
acount -> acount
acounts -> acount
accounnt -> accounnt
Result: 3 Terms (account, acount, account) where I would have liked 1 (account) as all these relate to the same term.
1) To correct spelling is a possibility, but I have never attempted that in R. Is that even possible?
2) The other option is to make a reference list i.e. account = (accounts, account, accounting, acounting, acount, acounts, accounnt) and then replace all occurrences with the master term. How would I do this in R?
Once again, any help/suggestions would be greatly appreciated.