0

I would like to convert the following xml file to a R data frame (https://www.dropbox.com/s/wwejpr0nfcj6gnw/20130919_20130920_SLD_FCST_ACTUAL_20210507_15_57_58_v1.xml?dl=1)

I have tried to use the XML package and xml2 package but had no luck. I am new to xml and would appreciate any help.

In the XML package I have tried xmlParse and xmlToDataFrame. In the xml2 package I have tried read_xml and xml_find_all functions and none of them worked.

Update: The solution provided by Onyambu below works for me but if other community members have other ways of solving this problem, please feel free to provide other solutions.

Rav
  • 3
  • 4
  • Does this answer your question? [R: convert XML data to data frame](https://stackoverflow.com/questions/33446888/r-convert-xml-data-to-data-frame) – maydin May 08 '21 at 00:45

2 Answers2

1

One way is to do:

a <- 'https://www.dropbox.com/s/wwejpr0nfcj6gnw/20130919_20130920_SLD_FCST_ACTUAL_20210507_15_57_58_v1.xml?dl=1'

library(purrr)
library(rvest)

h <- c("data_item", "resource_name", "opr_date", "interval_num", 
       "interval_start_gmt", "interval_end_gmt", "value")

read_html(a) %>%
  html_nodes("rto report_item")%>%
  map_dfr(~map_dfc(setNames(h, h), 
               ~html_text(html_nodes(.y, .x)), .y = .x), .id = "grp")

 A tibble: 120 x 8
   grp   data_item  resource_name opr_date interval_num interval_start_~ interval_end_gmt value
   <chr> <chr>      <chr>         <chr>    <chr>        <chr>            <chr>            <chr>
 1 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 7            2013-09-19T13:0~ 2013-09-19T14:0~ 20425
 2 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 24           2013-09-20T06:0~ 2013-09-20T07:0~ 20424
 3 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 12           2013-09-19T18:0~ 2013-09-19T19:0~ 20425
 4 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 14           2013-09-19T20:0~ 2013-09-19T21:0~ 20425
 5 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 18           2013-09-20T00:0~ 2013-09-20T01:0~ 20458
 6 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 19           2013-09-20T01:0~ 2013-09-20T02:0~ 20454
 7 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 21           2013-09-20T03:0~ 2013-09-20T04:0~ 20443
 8 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 23           2013-09-20T05:0~ 2013-09-20T06:0~ 20430
 9 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 4            2013-09-19T10:0~ 2013-09-19T11:0~ 20425
10 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 5            2013-09-19T11:0~ 2013-09-19T12:0~ 20425
# ... with 110 more rows
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Thank you Onyambu for your help. This is exactly what I need. It will take me while to understand what you did but it is elegant enough that I should be able to understand it. Once again thank you :) – Rav May 08 '21 at 03:08
0

I recommend you convert the XML file into a CVS file (you could do this in excel), and then use the command read.csv().