0

I just tried out this quite interesting Youtube-R-tutorial about building a text mining machine: http://www.youtube.com/watch?v=j1V2McKbkLo

Currently I have come so far that the whole code I have is

# Tutorial: http://www.youtube.com/watch?v=j1V2McKbkLo

# init

libs <- c("tm", "plyr", "class")
lapply(libs, require, character.only = TRUE)

# set options
options(stringsAsFactors = FALSE)

# set parameters
candidates <- c("Obama", "Romney")
pathname <- "C:/Users/***"      # here I pointed out the name for reasons of anonymity

# clean text
cleanCorpus <- function(corpus){
    corpus.tmp <- tm_map(corpus, removePunctuation)
    corpus.tmp <- tm_map(corpus.tmp, stripWhitespace)
    corpus.tmp <- tm_map(corpus.tmp, tolower)
    corpus.tmp <- tm_map(corpus.tmp, removeWords, stopwords("english"))
    return(corpus.tmp)
}

# build TDM
generateTDM <- function(cand, path){
    s.dir <- sprintf("%s/%s", path, cand)
    s.cor <- Corpus(DirSource(directory = s.dir, encoding = "ANSI"))
    s.cor.cl <- cleanCorpus(s.cor)
    s.tdm(TermDocumentMatrix(s.cor.cl))
    s.tdm <- removeSparseTerms(s.tdm, 0.7)
    result <- list(name = cand, tdm = s.tdm)
}

tdm = lapply(candidates, generateTDM, path = pathname)

When I try to run this, I constantly get the following error message:

tdm = lapply(candidates, generateTDM, path = pathname)

Error in DirSource(directory = s.dir, encoding = "ANSI") : 
  empty directory

and I just can't figure out where the error is. I tried several versions of writing the directory path but none works. I am unsure whether the error is in RStudio not being able to access locally saved data or whether it is in the overall code and I would be absoluty happy if anybody could help me or give any hints.

Thank you!

vrajs5
  • 4,066
  • 1
  • 27
  • 44
  • Please add `print(s.dir)` to `generateTDM ` and re-run the code. Are you sure this shows the correct full path? – tonytonov Jun 10 '14 at 10:12
  • thanks to you for this hint - now it does show the path but still tells me that the directory would be empty. :/ – user3725523 Jun 10 '14 at 11:25
  • I think the main mistake was in not being able to connect RStudio to locally saved files. We tried to put up the files to the home directory using Filezilla now, I hope it will work... – user3725523 Jun 10 '14 at 12:34

1 Answers1

0

On Windows you need to separate path components by \ (not /), and in R strings your need to type "\\" to get a single \. Thus, you can (hopefully) solve your problem by defining pathname as follows:

pathname <- "C:\\Users\\***"

(of course writing the correct path instead of the starts).

jochen
  • 3,728
  • 2
  • 39
  • 49
  • thanks to you! I incorporated this in the code but still get the same error: `> tdm = lapply(candidates, generateTDM, path = pathname) [1] "C:\\Users\\***\\Obama" Show Traceback Rerun with Debug Error in DirSource(directory = s.dir, encoding = "ANSI") : empty directory ` even though there definitely are files in that directory. I would be really happy about further hints – user3725523 Jun 10 '14 at 11:33