I have some data shaped like this:
<people>
<person first="Mary" last="Jane" sex="F" />
<person first="Susan" last="Smith" sex="F" height="168" />
<person last="Black" first="Joseph" sex="M" />
<person first="Jessica" last="Jones" sex="F" />
</people>
I would like a data frame that looks like this:
first last sex height
1 Mary Jane F NA
2 Susan Smith F 168
3 Joseph Black M NA
4 Jessica Jones F NA
I've gotten this far:
library(XML)
xpeople <- xmlRoot(xmlParse(xml))
lst <- xmlApply(xpeople, xmlAttrs)
names(lst) <- 1:length(lst)
But I can't for the life of me figure out how to get the list into the data frame. I can get the list to be "square" (i.e. fill in the gaps) and then put it into a data frame:
lst <- xmlApply(xpeople, function(node) {
attrs = xmlAttrs(node)
if (!("height" %in% names(attrs))) {
attrs[["height"]] <- NA
}
attrs
})
df = as.data.frame(lst)
But I have the following problems:
- The data frame is transposed
- first and last are Factors, not chr
- height is a Factor, not numeric
- the first and last names got swapped around for Joseph Black (not a big issue since my data is normally consistent, but annoying nonetheless)
How can I get the data frame in the correct form?