1

I have various XML files with information as shown below. Im having difficulty parsing this variable XML format into a dataframe that can handle both differing numbers of metrics and duplicated properties tags.

  <ProducedFruits>
    <FruitType>
      <FruitName>Apple</FruitName>
      <FruitMetrics>
        <Properties Sugars="27.51" Rate="5.03" />
        <Properties Sugars="219.39" Rate="12.19" />
        <Properties Sugars="266.34" Rate="75.9" />
      </FruitMetrics>
    </FruitType>
    <FruitType>
      <FruitName>Lime</FruitName>
      <FruitMetrics>
        <Properties Sugars="1884.2" Rate="5" />
        <Properties Sugars="1884.2" Rate="98.3" />
      </FruitMetrics>
    </FruitType>
    <FruitType>
      <FruitName>Lemon</FruitName>
      <FruitMetrics>
        <Properties Sugars="1064.77" Rate="5" />
        <Properties Sugars="1064.77" Rate="56" />
      </FruitMetrics>
    </FruitType>
    <FruitType>
      <FruitName>Banana</FruitName>
      <FruitMetrics>
        <Properties Sugars="113" Rate="12" />
        <Properties Sugars="113" Rate="79" />
      </FruitMetrics>
    </FruitType>
  </ProducedFruits>

Each file may be somewhat different, so ideally i would to create something that can handle the inconsistent number of values that also preserves the fruitname and creates a dataframe like the one at the bottom.

enter image description here

1 Answers1

0

To pass your xml into R as a dataframe you can use the XML package (https://cran.r-project.org/web/packages/XML/), e.g. data <- XML::xmlParse("doc.xml") then bind lists together with xml_data <- XML::xmlToList(data) then xml_df <- as.data.frame(xml_data) (per: How to parse XML to R data frame)

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46