0

I´m having trouble parsing xml in the form of text in a dataframe.

REPREX:

xml_content <- 
'<Content>
 <Components>
   <Component Quantity="3.891" PartNumber="ABS500" Designation="" Specification="AIMS05-01005" Attributes="{}" />
   <Component Quantity="1.109" PartNumber="ABS538" Designation="" Specification="AIMS05-12006" Attributes="{&quot;Tech&quot;:&quot;Metalic&quot;,&quot;Area&quot;:&quot;&quot;}" />
   <Component Quantity="1.639" PartNumber="Z24206" Designation="" Specification="DN-400" Attributes="{&quot;Tech&quot;:&quot;Composite&quot;,&quot;Area&quot;:&quot;&quot;}" />
 </Components>
 <Sharepoints>
   <Sharepoint DocumentId="11936" IdVersion="1536" Index="6">
   <BelongsTo>
   <BelongsToComponent ComponentGUID="f7d3c67d-55fe-411c-973a-cce844337f24" ComponentType="Formula" />
   </BelongsTo>
   </Sharepoint>
   <Sharepoint DocumentId="13195" IdVersion="1024" Index="B">
   <BelongsTo>
   <BelongsToComponent ComponentGUID="c455d81c-32f5-4e8a-815a-c32fde9efad9" ComponentType="Formula" />
   </BelongsTo>
   </Sharepoint>
 </Sharepoints>
</Content>'

df <- data.frame(
 part = c("A", "B"),
 name = c("Name of A", "Name of B"),
 content = c(xml_content, xml_content)
)

This funciton from xmlconvert returns an error:

library(xmlconvert)
xml_to_df(xml_content, records.tag = "Component", fields = "attributes")
xml_to_df(df$content, records.tag = "Component", fields = "attributes")

Error in xml_to_df(df$content, records.tag = "Component", fields = "attributes") : File or URL ...

This flatxml function only works with a file:

library(flatxml)
fxml_importXMLFlat()

This XML function also returns an error:

library(XML)
xmlTreeParse(df$content)

Extra content at the end of the document Error: 1: Extra content at the end of the document

Anyone knows how to do this from a dataframe? If I save the text xml_content as a local .xml file then it does work ok.

AleG
  • 153
  • 8

1 Answers1

0

I solved it using xml2:

library(xml2)
data <- read_xml(df$content[1])

Component <- data %>% xml_find_all("//Component")
Quantity <- Component %>% xml_attr("Quantity") %>% as.numeric() %>% as.list()
PartNumber <- Component %>% xml_attr("PartNumber") %>% as.list()
Designation <- Component %>% xml_attr("Designation") %>% as.list()
Specification <- Component %>% xml_attr("Specification") %>% as.list()

This answer was helpful.

AleG
  • 153
  • 8