I am not an R-pro, but merely a humble user merging different information into one script which fits my needs. Sadly, I stumpled upon a problem I could not solve, which is why i came here:
I´d like to fit data from many XML files into one data frame. Now, one of the variables/columns (extracted from a specific node) is pretty large (lots of text with many linebreaks etc). When parsing the XMLs and coercing the extracted information into a df (writing it into a .txt file with tab delimiter), R writes this one large variable not as one column, which makes it impossible to deal with the ouput as a data frame.
Now, to solve this conundrum, I´d like to insert something like
gsub("[\t\n]", "", xmlValue)
as an argument of the sixth xpathSApply function to get rid of the linebreaks. How can it be integrated? Or is there another answer?
Here is my Code so far:
rm(list = ls())
setwd("L:/.../testfiles")
library(XML)
list.files(path = ".", pattern = NULL, all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
files <- dir(path = ".", pattern = NULL, all.files = FALSE,
full.names = FALSE, recursive = FALSE,
ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
for(i in 1:2){
tryCatch({
motion <- xmlParse(files[i])
root <- xmlRoot(motion)
frame <- data.frame(
"." = xpathSApply(root, "//dokument/datum", xmlValue),
"." = xpathSApply(root, "//dokument/subtyp", xmlValue),
"." = xpathSApply(root, "//dokument/titel", xmlValue),
"." = xpathSApply(root, "//dokument/subtitel", xmlValue),
"." = xpathSApply(root, "//dokintressent//namn", xmlValue),
"." = xpathSApply(root, "//dokument/html", xmlValue), ## <- the huge node
check.names=FALSE, check.rows=FALSE)
colnames(frame)[1] <- ""
colnames(frame)[2] <- ""
colnames(frame)[3] <- ""
colnames(frame)[4] <- ""
colnames(frame)[5] <- ""
colnames(frame)[6] <- ""
write.table(frame, "L:/.../Satz.txt",
sep="\t", append=TRUE, na="NA", row.names=FALSE)
}, error=function(e){cat("ERROR :",conditionMessage(e), "\n")})
}
Sample data: 2 files (I am not allowed to post more - just click the links and select "slow download"). http://speedy.sh/S2JsU/modifiedFile1.xml http://speedy.sh/Ce2Jg/modifiedFile2.xml
I seriously hope the solution to this is as hard as it appears to me, and I don´t have to be too ashamed for asking it in front of this noble community.
Thank you all so much!
ch