I have a very similar case to this one (Load XML to Dataframe in R with parent node attributes), where I’m trying to convert xml to a df, but I’m unable to deal with the non-existing nodes “sp” and “l”. (I do not care about node “m”). Suppose my xml looks like this:
<text>
<body>
<div1 type="scene1” n="1">
<sp who="fau">
<l c="30" a="Settle thy studies"/>
<m x="40" b="To sound the depth of that thou wilt profess"/>
</sp>
<sp who="eang">
<m x="105" b="Go forward, Faustus, in that famous art"/>
</sp>
</div1>
<div1 type="scene2” n="2">
<sp who="fau">
<l c="31" a="Settle thy"/>
<m x="50" b="To sound the depth of"/>
</sp>
<sp who="fau">
<l c="32" a="Settle"/>
<m x="60" b="To sound the"/>
</sp>
<sp who="fau">
<l c="33" a="Settle thy studies, Faustus"/>
<m x="40" b="To sound the depth of that thou wilt"/>
</sp>
</div1>
<div1 type="scene3” n="3">
</div1>
<div1 type="scene4” n="4">
</div1>
<div1 type="scene5” n="5">
</div1>
</body>
</text>
This is what I would like to obtain:
n type lc la
1 scene1 30 Settle thy studies
2 scene2 31 Settle thy
2 scene2 32 Settle
2 scene2 33 Settle thy studies, Faustus
3 scene3 NA NA
4 scene4 NA NA
5 scene5 NA NA
I’ve tried this:
doc = xmlTreeParse("play.xml", useInternal = TRUE)
bodyToDF <- function(x){
n <- xmlGetAttr(x, "n")
type <- xmlGetAttr(x, "type")
sp <- xpathApply(x, 'sp', function(sp) {
if(is.null(sp)) {
lc <- NA
la <- NA
}
lc <- xpathSApply(sp, 'l', function(l) { xmlGetAttr(l,"c")})
la = xpathSApply(sp, 'l', function(l) { xmlValue(l,"a")})
data.frame(n, type, lc, la)
})
do.call(rbind, sp)
}
res <- xpathApply(doc, '//div1', bodyToDF)
but it doesn’t work:
Error in data.frame(n, type, lc, la) :
arguments imply differing number of rows: 1, 0
and also this:
div1 = sapply(c("n","type"), function(x) xpathSApply(doc, "//div1", xmlGetAttr, x), simplify=FALSE)
l = sapply(c("c","a"), function(x) xpathSApply(doc, "//l", xmlGetAttr, x), simplify=FALSE)
df <- data.frame(div1,l)
but I can’t seem to get the correct match between the nodes and df rows:
Error in data.frame(div1, l) :
arguments imply differing number of rows: 5, 4
Any ideas? Thank you.