0

I have been trying to import the file

reuters <- Corpus(DirSource(directory = "E:\\R Programs\\Test\\Reuteurs\\reut2-000.xml", encoding = "UTF-8"), 
   readerControl = list(reader = readReut21578XMLasPlain))

However I get below error:

Error in DirSource(directory = "E:\\R Programs\\Test\\Reuteurs\\reut2-000.xml",  : 
  empty directory

I have also checked other solutions provide in stackoverflow but its not working for me. Am I missing anything?

But below code works: Why DirSource method is not working for me? Am I missing anything?

reuters <- Corpus(URISource("file://E:\\R Programs\\Test\\Reuteurs\\reut2-000.xml",encoding="UTF-8"), 
   readerControl = list(reader = readReut21578XMLasPlain))

Reference link which I referred:

R: Got problems in reading text file

Using R for Text Mining Reuters-21578

R Error in trying to access local data

Community
  • 1
  • 1
samy
  • 65
  • 1
  • 8

3 Answers3

2

reut2-000.xml probably is a file, and not a directory?

Opening a file as directory will cause an error.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
0

I would suggest that you use the preprocessed Reuters Corpus from R package tm.corpus.Reuters21578 (as I've already recommended here: Using R for Text Mining Reuters-21578).

install.packages("tm.corpus.Reuters21578", repos = "http://datacube.wu.ac.at")
library(tm.corpus.Reuters21578)
data(Reuters21578)

These are the same data as in the original Reuters xml files, however without the issues with encoding, missing xml declaration etc.

Community
  • 1
  • 1
Lenka Vraná
  • 1,686
  • 2
  • 19
  • 29
0

finally i found a way out to this error:

words<-Corpus(VectorSource(fread(file,encoding = 'UTF-8',sep = ',',verbose = TRUE)))

hope this helps

DAA
  • 13
  • 1
  • 4