I have attempting to find the frequency per term in Martin Luther King's "I Have a Dream" speech. I have converted all uppercase letters to lowercase and I have removed all stop words. I have the text in a .txt file so I cannot display it on here. The code that reads in the file is below:
speech <- readLines(speech.txt)
Then I performed the conversion to lowercase and removal of stop words successfully and called it:
clean.speech
Now I am having some issues with finding the frequency per term. I have created a corpus, inspected my corpus, and created a TermDocumentMatrix as follows:
myCorpus <- Corpus(VectorSource(clean.speech))
inspect(myCorpus)
TDM <- TermDocumentMatrix(myCorpus)
Everything is fine up to this point. However, I then wrote the following code and got the warning message of:
m < as.matrix(TDM)
Warning Message:
"In m < as.matrix(TDM): longer object length is not a multiple of shorter object length
I know this is a very common warning message, so I Googled it first, but I could not find anything pertaining to frequency of terms. I proceeded to run the following text, to see if it would run with a warning message, but it did not.
v <- sort(rowSums(m), decreasing = TRUE)
d <- data.frame(word=names(v), freq=v)
head(d, 15)
My goal is just to find the frequency of terms. I sincerely apologize for asking this question because I know this question gets asked a lot. I just do not understand what to change about my code. Thank you everyone I appreciate it!