2

I have an XML file - brief version below

<resultset>
  <row>
    <column name="indexpatient">2</column>
    <column name="height" null="true"></column>
    <column name="ParameterMeasure">Cardiac/MM/Dimension/LVIDd</column>
    <column name="ParameterId">MM/LVIDd</column>
    <column name="ResultIdentifier">Average</column>
    <column name="ResultValue">0.05617021151</column>
  </row>
  <row>
    <column name="indexpatient">2</column>
    <column name="height" null="true"></column>
    <column name="ParameterMeasure">Cardiac/MM/Dimension/LVIDd</column>
    <column name="ParameterId">MM/LVIDs</column>
    <column name="ResultIdentifier">Measurement No. 1</column>
    <column name="ResultValue">0.05341702</column>
  </row>
</resultset>

The ideal output is each of the column names eg indexpatient to appear as columns in a dataframe and values as rows.

Can anybody help how I could do this using R?

I am stuck as each of the subnodes have the same name i.e. 'column name'.

user3919790
  • 557
  • 1
  • 4
  • 10
  • It will not be possible to directly convert this to the structure you desire. Your will first need to extract the data, and will likely have every attribute as a column or row. But then restructuring is fairly easy. I can recommend xml2 package for this stuff. – vanao veneri Mar 17 '19 at 22:32
  • `xmlToDataFrame(txt)` from the *XML* package does a decent job minus naming the columns. – thelatemail Mar 17 '19 at 23:01

1 Answers1

-1

Here is a solution based on this question/answer: R XML - combining parent and child nodes(w same name) into data frame

library(xml2)
library(dplyr)
page<-read_xml('<resultset>
  <row>
         <column name="indexpatient">2</column>
         <column name="height" null="true"></column>
         <column name="ParameterMeasure">Cardiac/MM/Dimension/LVIDd</column>
         <column name="ParameterId">MM/LVIDd</column>
         <column name="ResultIdentifier">Average</column>
         <column name="ResultValue">0.05617021151</column>
         </row>
         <row>
         <column name="indexpatient">2</column>
         <column name="height" null="true"></column>
         <column name="ParameterMeasure">Cardiac/MM/Dimension/LVIDd</column>
         <column name="ParameterId">MM/LVIDs</column>
         <column name="ResultIdentifier">Measurement No. 1</column>
         <column name="ResultValue">0.05341702</column>
         </row>
         </resultset>')


rows<- page %>% xml_find_all('//row') 

dfs<-lapply(rows, function(node){
   #find the attr value from all child nodes
   names<-node %>% xml_children() %>% xml_attr("name")  
   #find all values
   values<-node %>% xml_children() %>% xml_text()

   #create data frame and properly label the columns
   df<-data.frame(t(values), stringsAsFactors = FALSE)
   names(df)<-names
   df
})

#bind together and add uid to final dataframe.
answer<-bind_rows(dfs)
answer

# indexpatient height           ParameterMeasure ParameterId  ResultIdentifier   ResultValue
# 1            2        Cardiac/MM/Dimension/LVIDd    MM/LVIDd           Average 0.05617021151
# 2            2        Cardiac/MM/Dimension/LVIDd    MM/LVIDs Measurement No. 1    0.05341702
> 
Dave2e
  • 22,192
  • 18
  • 42
  • 50