0

I have an XML and I am trying to transform it into a DF by choosing a specific root. My XML:

<?xml version="1.0" encoding="ISO-8859-1" ?>


<test:TASS xmlns="http://www.vvv.com/schemas"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="http://www.vvv.com/schemas http://www.vvv.com/schemas/testV2_02_03.xsd"  xmlns:test="http://www.vvv.com/schemas" >
    <test:house>
                <test:billing>
                    <test:proceduresummary>
                        <test:guidenumber>X2030</test:guidenumber>
                            <test:diagnosis>
                                <test:table>ICD-10</test:table>
                                <test:diagnosiscod>J441</test:diagnosiscod>
                                <test:description>CHRONIC OBSTRUCTIVE PULMONARY DISEASE WITH (ACUTE) EXACERBATION</test:description>
                            </test:diagnosis>
                            <test:procedure>
                                <test:procedure>
                                    <test:description>HOSPITAL</test:description>
                                </test:procedure>
                                <test:amount>12</test:amount>
                            </test:procedure>
                    </test:proceduresummary>
                </test:billing>
                    <test:billing>
                    <test:proceduresummary>
                        <test:guidenumber>Y6055</test:guidenumber>
                            <test:diagnosis>
                                <test:table>ICD-10</test:table>
                                <test:diagnosiscod>I21</test:diagnosiscod>
                                <test:description>ACUTE MYOCARDIAL INFARCTION</test:description>
                            </test:diagnosis>
                            <test:procedure>
                                <test:procedure>
                                    <test:description>HOSPITAL</test:description>
                                </test:procedure>
                                <test:amount>8</test:amount>
                            </test:procedure>
                    </test:proceduresummary>
                </test:billing>
                    <test:billing>
                    <test:proceduresummary>
                        <test:guidenumber>Z9088</test:guidenumber>
                            <test:diagnosis>
                                <test:table>ICD-10</test:table>
                                <test:diagnosiscod>F20</test:diagnosiscod>
                                <test:description>SCHIZOPHRENIA</test:description>
                            </test:diagnosis>
                            <test:procedure>
                                <test:procedure>
                                    <test:description>HOSPITAL</test:description>
                                </test:procedure>
                                <test:amount>1</test:amount>
                            </test:procedure>
                    </test:proceduresummary>
                </test:billing>
    </test:house>
</test:TASS>

For each root called guidenumber I want to extract the information from: diagnosiscod and description. I would like it to be a DF, as shown below:

guidenumber <- c('X2030','Y6055','Z9088')
diagnosiscod <- c('J441','I21','F20')
description <- c('CHRONIC OBSTRUCTIVE PULMONARY DISEASE WITH (ACUTE) EXACERBATION','ACUTE MYOCARDIAL INFARCTION','SCHIZOPHRENIA')
df<- data.frame(guidenumber,diagnosiscod,description)

I tried:

require(tidyverse)
require(xml2)
setwd("D:/")
myxml<- read_xml("base.xml")
house <- myxml %>% xml_find_all("//house")

I used this information:, Meteorological Data from XML to Dataframe in R.

This example is what I need to solve my difficulty, but my XML is coming blank.

How would I solve this problem so that I can proceed and turn my XML into DF?

Bruno Avila
  • 296
  • 2
  • 10
  • I'm trying several days, including posting a question here, but it was considered too generic. That's why I made this post more detailed. I have tried using both XML and xml2. I used this example too and couldn't (https://stackoverflow.com/questions/33446888/r-convert-xml-data-to-data-frame). That's why I asked for help here. – Bruno Avila Dec 12 '19 at 15:28
  • @Parfait I improved my doubt once again with an example of my attempt. – Bruno Avila Dec 12 '19 at 18:37

1 Answers1

2

You were on the correct track for solving this, your issue was incorrectly identifying the names of the nodes. In this case everything is starting with "test:"

library(xml2)

myxml<-read_xml(' **.... Reading file from above.....** ')

#strip namesspaces.  #not needed in this case
#xml_ns_strip(myxml)

#find high level node containing all of the requested information
procedures<-myxml %>% xml_find_all(".//test:proceduresummary")

#extract the requested information from each node
#assumes only 1 subnode per parent
guidenumber<- procedures %>% xml_find_first(".//test:guidenumber") %>% xml_text()
#since there are 2 description sub-subnodes, select the correct diagnosis subnode before selecting diagnosicod
diagnosiscod <- procedures %>% xml_find_first(".//test:diagnosis")%>% xml_find_first(".//test:diagnosiscod") %>% xml_text()
description<- procedures %>% xml_find_first(".//test:description") %>% xml_text()

answer<-data.frame(guidenumber, diagnosiscod, description)
head(answer)

Remember this is assuming only one piece of requested information per "procedure" node. If there are more than one guide number, diagnosis, etc per procedure this method will only select the first one. Good luck.

Dave2e
  • 22,192
  • 18
  • 42
  • 50