I am trying to do some work with the well known Reuters-21578 dataset and am having some trouble with loading the sgm files into my corpus.
Right now I am using the command
require(tm)
reut21578 <- system.file("reuters21578", package = "tm")
reuters <-Corpus(DirSource(reut21578),
readerControl = list(reader = readReut21578XML))
In an attempt to include all the files into my corpus but this gives me the following error:
Error in DirSource(reut21578) : empty directory
Any idea where I may be going wrong?