Currently I have ~20,000 XML files that range in size from a couple of KB to a few MB. Although it may not be ideal, I am using the "xmlTreeParse" function in the XML package to loop through each of the files and extract the text that I need and save the document as a csv file.
The code below works fine for files <1 MB in size:
files <- list.files()
for (i in files) {
doc <- xmlTreeParse(i, useInternalNodes = TRUE)
root <- xmlRoot(doc)
name <- xmlValue(root[[8]][[1]][[1]]) # Name
data <- xmlValue(root[[8]][[1]]) # Full text
x <- data.frame(c(name))
x$data <- data
write.csv(x, paste(i, ".csv"), row.names=FALSE, na="")
}
The trouble is that any file >1 MB gives me the following error:
Excessive depth in document: 256 use XML_PARSE_HUGE option
Extra content at the end of the document
Error: 1: Excessive depth in document: 256 use XML_PARSE_HUGE option
2: Extra content at the end of the document
Please forgive my ignorance, however I have tried searching for the "XML_PARSE_HUGE" function in the XML package and can't seem to find it. Has anyone had any experience using this function? If so, I would greatly appreciate any advice as to how to get this code to handle slightly larger XML files.
Thanks!