I’m trying to pull XML data in R using the code below and am new to the process. All of the data points seem to be correct except for the NEW_DATE column. New date for id=1 row: NEW_DATE = 852163200000 instead of 1997-01-02T00:00:00 listed in the original XML format below. It seems that when I parse the session response that NEW_DATE returns a character type with a value I can’t interpret. The only code I changed for this post is substituting the proxy URL with # placeholders.
Any help is greatly appreciated!
library(XML)
library(RCurl)
library(xml2)
library(httr)
library(rvest)
library(dplyr)
library(tidyverse)
#setup proxy
my_proxy = use_proxy(url="##.#.##.##:####")
#setup session and response
my_session = html_session("https://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData",my_proxy)
my_response = my_session$response
#check status
status_code(my_session)
status_code(my_response)
#retrieve content XML
content_parsed = content(my_session$response, as = "parsed")
#convert list to data frame
ust.df = data.frame(t(sapply(content_parsed$d,c)))
#<xs:datetime> data type is used to represent date and time in YYYY-MM-DDThh:mm:ss
#list column names
colnames(ust.df)
#remove X__metadata column
ust.df = ust.df %>%
select(-1)
#replace Date with "" in NEW_DATE column
ust.df$NEW_DATE = gsub("Date", "", paste(ust.df$NEW_DATE))
#replace (,),/ with "" in NEW_DATE column
ust.df$NEW_DATE =gsub("[[:punct:]]", "", ust.df$NEW_DATE)
#fix $NEW_DATE format -- 12 digits
#Id =1 NEW_DATE = 852163200000 instead of 1997-01-02T00:00:00 listed below
XML code sample for Id = 1 reference
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xml:base="http://data.treasury.gov/Feed.svc/">
<title type="text">DailyTreasuryYieldCurveRateData</title>
<id>http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData</id>
<updated>2021-03-11T17:17:56Z</updated>
<link rel="self" title="DailyTreasuryYieldCurveRateData" href="DailyTreasuryYieldCurveRateData" />
<entry>
<id>http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(1)</id>
<title type="text" />
<updated>2021-03-11T17:17:56Z</updated>
<author>
<name />
</author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(1)" />
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<content type="application/xml">
<m:properties>
<d:Id m:type="Edm.Int32">1</d:Id>
<d:NEW_DATE m:type="Edm.DateTime">1997-01-02T00:00:00</d:NEW_DATE>
<d:BC_1MONTH m:type="Edm.Double" m:null="true" />
<d:BC_2MONTH m:type="Edm.Double" m:null="true" />
<d:BC_3MONTH m:type="Edm.Double">5.190000057220459</d:BC_3MONTH>
<d:BC_6MONTH m:type="Edm.Double">5.3499999046325684</d:BC_6MONTH>
<d:BC_1YEAR m:type="Edm.Double">5.630000114440918</d:BC_1YEAR>
<d:BC_2YEAR m:type="Edm.Double">5.96999979019165</d:BC_2YEAR>
<d:BC_3YEAR m:type="Edm.Double">6.130000114440918</d:BC_3YEAR>
<d:BC_5YEAR m:type="Edm.Double">6.3000001907348633</d:BC_5YEAR>
<d:BC_7YEAR m:type="Edm.Double">6.4499998092651367</d:BC_7YEAR>
<d:BC_10YEAR m:type="Edm.Double">6.5399999618530273</d:BC_10YEAR>
<d:BC_20YEAR m:type="Edm.Double">6.8499999046325684</d:BC_20YEAR>
<d:BC_30YEAR m:type="Edm.Double">6.75</d:BC_30YEAR>
<d:BC_30YEARDISPLAY m:type="Edm.Double">0</d:BC_30YEARDISPLAY>
</m:properties>
</content>
</entry>
</feed>