I have 10 rows of text data in a CSV file. I want to make corrcetion to the various misspellings. For example of the word "battery" ( misspelled as "battere" or "batt" etc ). I consider using StemDocument followed by stemCompletion, and hence used the following code:
library(tm)
library(SnowballC)
text.var<-read.csv("C:\\Users\\Sambit\\Desktop\\Sample Data.csv",header=FALSE)
data_corp<-Corpus(VectorSource(text.var))
data_corp.copy<-data_corp
data_corp<-tm_map(data_corp, stemDocument)
data_corp<-tm_map(data_corp, stemCompletion, dictionary=data_corp.copy)
However, the last step , that is the Stem Completion step is showing the following error:
Error in setNames(if (length(n)) n else rep(NA, length(x)), x) :
'names' attribute [10] must be the same length as the vector [2]
In addition: Warning messages:
1: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
2: In grep(sprintf("^%s", w), dictionary, value = TRUE) :
argument 'pattern' has length > 1 and only the first element will be used
Where did I go possibly wrong?