I have a very large XML file (>70GB) from which I only need to read some segments. However, I also don't know the structure of the file, and failed to extract it due to the file's size.
I don't need to read the full file or convert it to a data frame - only to extract specific parts, but I don't know the specific format for those sequences since I don't have the structure.
I tried using xmlParse, and also using xmlEventParse based on what is suggested here: How to read large (~20 GB) xml file in R?
The code suggested there returns an empty data frame:
xmlDoc <- "Final.xml"
result <- NULL
#function to use with xmlEventParse
row.sax = function() {
ROW = function(node){
children <- xmlChildren(node)
children[which(names(children) == "text")] <- NULL
result <<- rbind(result, sapply(children,xmlValue))
}
branches <- list(ROW = ROW)
return(branches)
}
#call the xmlEventParse
xmlEventParse(xmlDoc, handlers = list(), branches = row.sax(),
saxVersion = 2, trim = FALSE)
#and here is your data.frame
result <- as.data.frame(result, stringsAsFactors = F)
I have little experience working with XML, and so I don't fully understand the solution I tried to use.
Thanks for your help!