0

New to R and looking for some help. How do I convert to a data frame from the following xml in R. The data frame should contain 3 columns for the respective column id.

<?xml version="1.0" encoding="UTF-8" ?> 
- <results version="1" total-rows="3" current-page="1" current-page-start-row="1" current-page-end-row="25" execution-time="0.0781255">
- <columns>
  <column id="ReferenceNumber" data-type="ReferenceNumber">Reference Number</column> 
  <column id="AllocatedTo" data-type="Allocation">Allocated To</column> 
  <column id="Reason" data-type="Category">Category Code</column> 
  </columns>
- <rows>
- <row case-reference="0150967018">
  <data column-id="ReferenceNumber">0150967018</data> 
  <data column-id="AllocatedTo">Suresh</data> 
  <data column-id="Reason">Actioned incorrectly</data> 
  </row>
- <row case-reference="0150967118">
  <data column-id="ReferenceNumber">0150967118</data> 
  <data column-id="AllocatedTo">Suresh</data> 
  <data column-id="Reason">Actioned incorrectly</data> 
  </row>
- <row case-reference="0150967218">
  <data column-id="ReferenceNumber">0150967218</data> 
  <data column-id="AllocatedTo">Suresh</data> 
  <data column-id="Reason">Actioned incorrectly</data> 
  </row>
  </rows>
  </results>

2 Answers2

0
library(xml2)
library(dplyr)

#pass your xml string to xml_text
xml_doc <- read_xml(xml_text)

df <- xml_doc %>% 
  xml_find_all("//rows/row/data") %>% 
  xml_text %>%
  matrix(ncol=3, byrow=T) %>%
  as.data.frame(stringsAsFactors=FALSE)
colnames(df) <- xml_doc %>% 
  xml_find_all("//columns/column") %>%
  xml_text
df

Output is:

  Reference Number Allocated To        Category Code
1       0150967018       Suresh Actioned incorrectly
2       0150967118       Suresh Actioned incorrectly
3       0150967218       Suresh Actioned incorrectly
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Prem
  • 11,775
  • 1
  • 19
  • 33
0

XML to Data Frame To handle the data effectively in large files we read the data in the xml file as a data frame. Then process the data frame for data analysis.

# Load the packages required to read XML files.
library("XML")
library("methods")

# Convert the input xml file to a data frame.
xmldataframe <- xmlToDataFrame("input.xml")
print(xmldataframe)

When we execute the above code, it produces the following result −

   Reference Number Allocated To        Reason
    1       0150967018       Suresh Actioned incorrectly
    2       0150967118       Suresh Actioned incorrectly
    3       0150967218       Suresh Actioned incorrectly

As the data is now available as a dataframe we can use data frame related function to read and manipulate the file.

mrk
  • 8,059
  • 3
  • 56
  • 78