1

I have received a set of xml files and I'm trying to convert them in dataframe using R. Problem is that the structure seems different from the ones I observe in other questions asked online, so I have no idea on how to solve this. I used the XLM library.

library(XML)
doc <- xmlParse("ULS_rows.xml")
xmltop = xmlRoot(doc) 
class(xmltop)
[1] "XMLInternalElementNode" "XMLInternalNode" "XMLAbstractNode"

xmlName(xmltop) 
[1] "Workbook"

xmlName(xmltop[[1]])  #name of root's children
[1] "Styles"

When I check the 2nd child entry I get something like this, which corresponds more or less to the content of the first sheet in the xml file:

    <Worksheet ss:Name="TOC">
  <Names>
    <NamedRange ss:Name="Print_Titles" ss:RefersTo="=TOC!R1"/>
  </Names>
  <Table>
    <Row ss:StyleID="HeaderStyle">
      <Cell>
        <Data ss:Type="String">Sheet Name</Data>
      </Cell>
      <Cell>
        <Data ss:Type="String">Description</Data>
      </Cell>
    </Row>
    <Row>
      <Cell ss:StyleID="HyperlinkStyle" ss:HRef="#MemCheROWSru1MemResBri!A1">
        <Data ss:Type="String">MemCheROWSru1MemResBri</Data>
      </Cell>
      <Cell>
        <Data ss:Type="String">Member_Check_ROWS.run(1) : Member Result Brief</Data>
      </Cell>
    </Row>
    <Row>
      <Cell ss:StyleID="HyperlinkStyle" ss:HRef="#MeChROWSr1NoMeRe201!A1">
        <Data ss:Type="String">MeChROWSr1NoMeRe201</Data>
      </Cell>
      <Cell>
        <Data ss:Type="String">Member_Check_ROWS.run(1) : Norsok Member Result 2013</Data>
      </Cell>
    </Row>
    <Row>
      <Cell ss:StyleID="HyperlinkStyle" ss:HRef="#MeChROWSr1NoCoRe201!A1">
        <Data ss:Type="String">MeChROWSr1NoCoRe201</Data>
      </Cell>
      <Cell>
        <Data ss:Type="String">Member_Check_ROWS.run(1) : Norsok Cone Result 2013</Data>
      </Cell>
    </Row>
    <Row>
      <Cell ss:StyleID="HyperlinkStyle" ss:HRef="#MemCheROWSru1JoiResBri!A1">
        <Data ss:Type="String">MemCheROWSru1JoiResBri</Data>
      </Cell>
      <Cell>
        <Data ss:Type="String">Member_Check_ROWS.run(1) : Joint Result Brief</Data>
      </Cell>
    </Row>
    <Row>
      <Cell ss:StyleID="HyperlinkStyle" ss:HRef="#MeChROWSr1NoJoRe201!A1">
        <Data ss:Type="String">MeChROWSr1NoJoRe201</Data>
      </Cell>
      <Cell>
        <Data ss:Type="String">Member_Check_ROWS.run(1) : Norsok Joint Result 2013</Data>
      </Cell>
    </Row>
  </Table>
</Worksheet> 

Clueless, I have tried to more or less blindly follow instructions I have found online, but functions such as:

ffgg <- xmlSApply(xmltop, function(x) xmlSApply(x, xmlValue))
ffgg_df <- data.frame(t(ffgg),row.names=NULL)

Gives nothing near to a dataframe. Any advice on what is the problem? Am I dealing with a regular XML file or am I missing something? Thanks

  • Is the last solution listed here (http://stackoverflow.com/questions/17198658/how-to-parse-xml-to-r-data-frame) helpful? I'm referring to the solution that starts "Use xpath ..." – Fred Boehm May 16 '17 at 15:48
  • There is an accepted answer for [this](http://stackoverflow.com/questions/17198658/how-to-parse-xml-to-r-data-frame) question that explains the use of the `XML` library. – Erik Schutte May 16 '17 at 15:50
  • 1
    Wait, is this just an xlsx file? If so use an Excel reader like readxl or openxlsx. – alistaire May 16 '17 at 17:14
  • Alistaire, it's not properly an excel file, but I can open and visualize it in excel format. Will try. Thanks – La Machine Infernale May 16 '17 at 19:36

0 Answers0