1

Hello guys, I need to load an xml file into a data frame in R. The xml format is as shown below. How do I acheive the same?

         <?xml version="1.0" encoding="utf-8"?><posts>  <row Id="1" PostTypeId="1" AcceptedAnswerId="17" CreationDate="2010-07-26T19:14:18.907" Score="6"/></posts>

I tried the below code....It does not give the desired output. I am expecting a tabular output with the column names and their values listed below.

library(XML)
xml.url ="test.xml"
xmlfile = xmlTreeParse(xml.url)

class(xmlfile)
xmltop=xmlRoot(xmlfile)

print(xmltop)[1:2]

plantcat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))

plantcat_df <- data.frame(t(plantcat))
Zack
  • 2,078
  • 10
  • 33
  • 58
  • 1
    What exactly is the structure of the desired output? Have you tried anything at all? We're not here to write code for you. You should show what you've attempted and described how it fails. I'm assume your attempts to google the problem at least let you to the `XML` package for R to parse your input. – MrFlick Nov 08 '14 at 20:35
  • Hello , I tried the below code library(XML) xml.url ="test.xml" xmlfile = xmlTreeParse(xml.url) class(xmlfile) xmltop=xmlRoot(xmlfile) print(xmltop)[1:2] plantcat <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue)) plantcat_df <- data.frame(t(plantcat)) – Zack Nov 08 '14 at 20:48
  • I am expecting a tabular output where i have columns as "row Id", "PostTypeId","AcceptedAnswerId","CreationDate","Score" and then r values listed below ( like you get when you query a database table)... – Zack Nov 08 '14 at 20:51

1 Answers1

3
xml.text <- 
'<?xml version="1.0" encoding="utf-8"?>
<posts>  
<row Id="1" PostTypeId="1" AcceptedAnswerId="17" CreationDate="2010-07-26T19:14:18.907" Score="6"/>
<row Id="2" PostTypeId="1" AcceptedAnswerId="17" CreationDate="2010-07-26T19:14:18.907" Score="6"/>
<row Id="3" PostTypeId="1" AcceptedAnswerId="17" CreationDate="2010-07-26T19:14:18.907" Score="6"/>
<row Id="4" PostTypeId="1" AcceptedAnswerId="17" CreationDate="2010-07-26T19:14:18.907" Score="6"/>
</posts>'

library(XML)
xml <- xmlParse(xml.text)
result <- as.data.frame(t(xmlSApply(xml["/posts/row"],xmlAttrs)),
                        stringsAsFactors=FALSE)
#   Id PostTypeId AcceptedAnswerId            CreationDate Score
# 1  1          1               17 2010-07-26T19:14:18.907     6
# 2  2          1               17 2010-07-26T19:14:18.907     6
# 3  3          1               17 2010-07-26T19:14:18.907     6
# 4  4          1               17 2010-07-26T19:14:18.907     6

This is a bit trickier than usual because the data is in attributes, not nodes (the nodes are empty), so we can't use xlmToDataFrame(...) unfortunately.

All the data above is still character, so you still need to convert the columns to whatever class is appropriate.

jlhoward
  • 58,004
  • 7
  • 97
  • 140
  • You could also use xmlAttrsToDataFrame. XML:::xmlAttrsToDataFrame(xmlRoot(xml)) – Chris S. Nov 17 '14 at 16:05
  • @ChrisS. - You should post this as an answer. I had no idea this function existed... I can't find it anywhere in the documentation. What other undocumented nuggets are there?? – jlhoward Nov 17 '14 at 16:25
  • I just started reading the new XML and Web technologies for Data Sciences in R book and noticed it there, but I'm not sure why it's not better documented. – Chris S. Nov 17 '14 at 20:54