3

I'm having my first go at transforming XML to an R data.frame and have found questions such as this one: How to transform XML data into a data.frame? very helpful, but still am unable to convert my piece of XML into a data.frame.

My aim is to make a plot of Euro to US Dollar exchange rates over time. The data is listed here in XML format:

http://www.ecb.europa.eu/stats/exchange/eurofxref/html/usd.xml

I'm able to read in the data and show which part of the data (node?) I'm interested in:

library(XML)
doc <- xmlTreeParse("http://www.ecb.europa.eu/stats/exchange/eurofxref/html/usd.xml")
root <- xmlRoot(doc)
root[[2]][[2]]

I've tried variations of getNodeSet() to show all of the lines that start with , but sofar to no avail:

getNodeSet(root, "/DataSet/Series/*")
getNodeSet(root, "//obs")
getNodeSet(root, "//obs[@OBS_VALUE = 1.1789]")

How do I go about to extract all or the variables TIME_PERIOD and OBS_VALUE from this XML file and put them into an R data.frame? Thanks already for any comments or clarifications.

Community
  • 1
  • 1
Richard
  • 1,224
  • 3
  • 16
  • 32
  • 1
    Tbh, I don't know why subsetting by xpaths doesn't work (?). An esoteric solution could be `t(sapply(xmlChildren(xmlRoot(doc)[[2]][[2]]), xmlAttrs))`. – lukeA Jan 11 '15 at 19:22

1 Answers1

4

This data is in sdmx format. You can use the R package rsdmx to parse the data:

library(rsdmx)
appData <- readSDMX("http://www.ecb.europa.eu/stats/exchange/eurofxref/html/usd.xml")
myData <- as.data.frame(appData)

> head(myData)
FREQ CURRENCY CURRENCY_DENOM EXR_TYPE EXR_SUFFIX TIME_FORMAT COLLECTION TIME_PERIOD OBS_VALUE OBS_STATUS OBS_CONF
1    D      USD            EUR     SP00          A         P1D          A  1999-01-04    1.1789          A        F
2    D      USD            EUR     SP00          A         P1D          A  1999-01-05    1.1790          A        F
3    D      USD            EUR     SP00          A         P1D          A  1999-01-06    1.1743          A        F
4    D      USD            EUR     SP00          A         P1D          A  1999-01-07    1.1632          A        F
5    D      USD            EUR     SP00          A         P1D          A  1999-01-08    1.1659          A        F
6    D      USD            EUR     SP00          A         P1D          A  1999-01-11    1.1569          A        F

Alternatively if you just have the XML package to hand:

doc <- xmlParse("http://www.ecb.europa.eu/stats/exchange/eurofxref/html/usd.xml")
docData <- getNodeSet(doc, "//ns:Obs"
                      , namespaces = c(ns = "http://www.ecb.europa.eu/vocabulary/stats/exr/1")
                      , fun = xmlAttrs)
docData <- do.call(rbind, docData)
> head(docData)
TIME_PERIOD  OBS_VALUE OBS_STATUS OBS_CONF
[1,] "1999-01-04" "1.1789"  "A"        "F"     
[2,] "1999-01-05" "1.1790"  "A"        "F"     
[3,] "1999-01-06" "1.1743"  "A"        "F"     
[4,] "1999-01-07" "1.1632"  "A"        "F"     
[5,] "1999-01-08" "1.1659"  "A"        "F"     
[6,] "1999-01-11" "1.1569"  "A"        "F" 
jdharrison
  • 30,085
  • 4
  • 77
  • 89
  • oh my, at times like these I love R so much ;) Thank you very much @jdharrison! – Richard Jan 11 '15 at 19:35
  • 1
    Thanks @jdharrisson for sharing this example. I also recommend to have a look to the rsdmx project documentation https://github.com/opensdmx/rsdmx/wiki which gives additional examples on how to you use rsdmx. Hope this helps – eblondel Jan 19 '15 at 21:17