0

I’m trying to pull XML data in R using the code below and am new to the process. All of the data points seem to be correct except for the NEW_DATE column. New date for id=1 row: NEW_DATE = 852163200000 instead of 1997-01-02T00:00:00 listed in the original XML format below. It seems that when I parse the session response that NEW_DATE returns a character type with a value I can’t interpret. The only code I changed for this post is substituting the proxy URL with # placeholders.

Any help is greatly appreciated!

library(XML)
library(RCurl)
library(xml2)
library(httr)
library(rvest)
library(dplyr)
library(tidyverse)

#setup proxy
my_proxy = use_proxy(url="##.#.##.##:####")
 
#setup session and response
my_session = html_session("https://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData",my_proxy)
my_response = my_session$response
 
#check status
status_code(my_session)
status_code(my_response)
 
#retrieve content XML
content_parsed = content(my_session$response, as = "parsed")
 
#convert list to data frame
ust.df = data.frame(t(sapply(content_parsed$d,c)))
 
#<xs:datetime> data type is used to represent date and time in YYYY-MM-DDThh:mm:ss
 
#list column names
colnames(ust.df)
 
#remove X__metadata column
ust.df = ust.df %>%
  select(-1)
 
#replace Date with "" in NEW_DATE column
ust.df$NEW_DATE = gsub("Date", "", paste(ust.df$NEW_DATE))
 
#replace (,),/ with "" in NEW_DATE column
ust.df$NEW_DATE =gsub("[[:punct:]]", "", ust.df$NEW_DATE)
 
#fix $NEW_DATE format -- 12 digits
 
#Id =1 NEW_DATE = 852163200000 instead of 1997-01-02T00:00:00 listed below

XML code sample for Id = 1 reference

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xml:base="http://data.treasury.gov/Feed.svc/">
   <title type="text">DailyTreasuryYieldCurveRateData</title>
   <id>http://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData</id>
   <updated>2021-03-11T17:17:56Z</updated>
   <link rel="self" title="DailyTreasuryYieldCurveRateData" href="DailyTreasuryYieldCurveRateData" />
   <entry>
      <id>http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(1)</id>
      <title type="text" />
      <updated>2021-03-11T17:17:56Z</updated>
      <author>
         <name />
      </author>
      <link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(1)" />
      <category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
      <content type="application/xml">
         <m:properties>
            <d:Id m:type="Edm.Int32">1</d:Id>
            <d:NEW_DATE m:type="Edm.DateTime">1997-01-02T00:00:00</d:NEW_DATE>
            <d:BC_1MONTH m:type="Edm.Double" m:null="true" />
            <d:BC_2MONTH m:type="Edm.Double" m:null="true" />
            <d:BC_3MONTH m:type="Edm.Double">5.190000057220459</d:BC_3MONTH>
            <d:BC_6MONTH m:type="Edm.Double">5.3499999046325684</d:BC_6MONTH>
            <d:BC_1YEAR m:type="Edm.Double">5.630000114440918</d:BC_1YEAR>
            <d:BC_2YEAR m:type="Edm.Double">5.96999979019165</d:BC_2YEAR>
            <d:BC_3YEAR m:type="Edm.Double">6.130000114440918</d:BC_3YEAR>
            <d:BC_5YEAR m:type="Edm.Double">6.3000001907348633</d:BC_5YEAR>
            <d:BC_7YEAR m:type="Edm.Double">6.4499998092651367</d:BC_7YEAR>
            <d:BC_10YEAR m:type="Edm.Double">6.5399999618530273</d:BC_10YEAR>
            <d:BC_20YEAR m:type="Edm.Double">6.8499999046325684</d:BC_20YEAR>
            <d:BC_30YEAR m:type="Edm.Double">6.75</d:BC_30YEAR>
            <d:BC_30YEARDISPLAY m:type="Edm.Double">0</d:BC_30YEARDISPLAY>
         </m:properties>
      </content>
   </entry>
</feed>
  • My apologies, the original post has been edited. Also, here's the link to the full context of the XML: https://data.treasury.gov/feed.svc/DailyTreasuryYieldCurveRateData?$filter=month(NEW_DATE)%20eq%203%20and%20year(NEW_DATE)%20eq%202021 – drpeanutjones Mar 24 '21 at 19:31
  • 1
    `852163200000` corresponds to `1997-01-02 00:00` as the integer number of seconds (since epoch, `1970-01-01`). To convert to date / time in R, see [Convert UNIX epoch to Date object](https://stackoverflow.com/q/13456241/1422451). – Parfait Mar 24 '21 at 20:13
  • ust.df$NEW_DATE = anydate(as.numeric(ust.df$NEW_DATE)/1000) worked for me! I just had to add 1 day to the results to get the correct day. Thank you very much for the help! – drpeanutjones Mar 24 '21 at 21:25

0 Answers0