I just tried out this quite interesting Youtube-R-tutorial about building a text mining machine: http://www.youtube.com/watch?v=j1V2McKbkLo
Currently I have come so far that the whole code I have is
# Tutorial: http://www.youtube.com/watch?v=j1V2McKbkLo
# init
libs <- c("tm", "plyr", "class")
lapply(libs, require, character.only = TRUE)
# set options
options(stringsAsFactors = FALSE)
# set parameters
candidates <- c("Obama", "Romney")
pathname <- "C:/Users/***" # here I pointed out the name for reasons of anonymity
# clean text
cleanCorpus <- function(corpus){
corpus.tmp <- tm_map(corpus, removePunctuation)
corpus.tmp <- tm_map(corpus.tmp, stripWhitespace)
corpus.tmp <- tm_map(corpus.tmp, tolower)
corpus.tmp <- tm_map(corpus.tmp, removeWords, stopwords("english"))
return(corpus.tmp)
}
# build TDM
generateTDM <- function(cand, path){
s.dir <- sprintf("%s/%s", path, cand)
s.cor <- Corpus(DirSource(directory = s.dir, encoding = "ANSI"))
s.cor.cl <- cleanCorpus(s.cor)
s.tdm(TermDocumentMatrix(s.cor.cl))
s.tdm <- removeSparseTerms(s.tdm, 0.7)
result <- list(name = cand, tdm = s.tdm)
}
tdm = lapply(candidates, generateTDM, path = pathname)
When I try to run this, I constantly get the following error message:
tdm = lapply(candidates, generateTDM, path = pathname)
Error in DirSource(directory = s.dir, encoding = "ANSI") :
empty directory
and I just can't figure out where the error is. I tried several versions of writing the directory path but none works. I am unsure whether the error is in RStudio not being able to access locally saved data or whether it is in the overall code and I would be absoluty happy if anybody could help me or give any hints.
Thank you!