2

I have an irregular time series of all trades of a given ETF over a span of 4 years:

 > head(BKF.xts)
                    BKF.xts
2008-01-02 09:30:01  59.870
2008-01-02 09:38:04  59.710
2008-01-02 09:39:51  59.612
2008-01-02 09:51:16  59.640
2008-01-02 10:06:08  59.500
> tail(BKF.xts)
                    BKF.xts
2011-12-30 15:59:23   36.26
2011-12-30 15:59:53   36.26
2011-12-30 15:59:56   36.27
2011-12-30 15:59:57   36.27
2011-12-30 15:59:58   36.27
2011-12-30 16:00:00   36.33

What I would like is to have the prices at every 5 minute interval for ALL trading days. Because I am dealing with ETFs it's possible that there are dates where the market is open that the ETF did not trade and so there will be no data for that date in my sample. However i need my final time series to account for all trading days. I have downloaded daily data for the same period so that I have another time series of every trading day. Not sure if that helps.

Also if there is no particular trade at one 5:00 minute time stamp I would like for the price of the most recent trade that took place. So for the data i posted above, what I would want is:

> head(BKF.xts)
                        BKF.xts
    2008-01-02 09:35:00  59.870
    2008-01-02 09:40:00  59.612
    2008-01-02 09:45:00  59.612
    2008-01-02 09:50:00  59.640
    2008-01-02 09:55:00  59.640

Any help is greatly appreciated.

Karina
  • 31
  • 1
  • 3
  • Related question: http://stackoverflow.com/questions/9778632/r-xts-generating-1-minute-time-series-from-second-events – Vincent Zoonekynd Mar 23 '12 at 03:04
  • @VincentZookekynd The solution to that question is to use to.minutes5... which i tried already and did not get what I want: – `> head(BKF.test) BKF.xts.Open BKF.xts.High BKF.xts.Low BKF.xts.Close 2008-01-02 09:30:01 59.87 59.87 59.870 59.870 2008-01-02 09:39:51 59.71 59.71 59.612 59.612 2008-01-02 09:51:16 59.64 59.64 59.640 59.640 2008-01-02 10:06:08 59.50 59.50 59.500 59.500 2008-01-02 10:13:36 59.55 59.55 59.550 59.550`. – Karina Mar 23 '12 at 03:35
  • And the other solution merges with a regular time series which means I would have data for ALL days of the year instead of trading days. – Karina Mar 23 '12 at 03:51
  • The answers to that question also suggested to use `align.time` to have the timestamps exactly at the end of each 5-minute period, to use `cbind` to merge the result with an emtpy time series with all the desired timestamps (5-minute-spaced timestamps, for "all trading days", in your case -- you need to define what you mean by "trading day", this is market-dependent), and to fill the missing values with `na.locf`. You can then extract the last column with `Cl`. – Vincent Zoonekynd Mar 23 '12 at 04:00
  • For all trading days in the United States. As I explained I have another time series of all dates where the market was open. Right now the only solution I can think of is to merge with an empty time series and use a loop to delete all values where the markets were not open (according to my other time series). This is going to take very long as my sample size is over 55million. Which is why I was asking if there's an easier way. – Karina Mar 23 '12 at 04:07

2 Answers2

2

As mentionned in a previous question, you can use to.period to have the last value in each 5-minute period, align.time to replace the timestamps with the end of each period, cbind to add the missing periods (with a missing value) and na.locf to replace the missing values.

# Sample data
library(quantmod)
days <- seq(Sys.Date(), by=1, length=20)
days <- days[ ! format(days, "%A") %in% c("Saturday", "Sunday") ]
timestamps <- ISOdatetime( 
  year(days), month(days), day(days), 
  9, 0, 0  # You may want/need to add the timezone
)
timestamps <- timestamps[-2] 
x <- lapply(timestamps, function(u) sort(u + sample(60*60*8,200)))
x <- do.call(c, x)
x <- xts(rnorm(length(x)), x)

# Value at the end of each 5-minute period
y <- to.minutes5(x)
y <- Cl(y)
y <- align.time(y, 5*60)

# All 5-minute periods, betweem 9am (excluded) 
# and 5pm (included) for each day 
z <- lapply(timestamps, function(u) u + 5*60*(1:(12*8)))
z <- do.call(c, z)
z <- cbind(y, xts(, z))

# Fill in missing values
z <- na.locf(z)
Community
  • 1
  • 1
Vincent Zoonekynd
  • 31,893
  • 5
  • 69
  • 78
1

Thanks, I actually figured it out on my own after enough trial and error and discovering the xts subset function. Here's what I did:

    #BKF here is my data set
    BKF<-xts(BKF$PRICE,order.by=BKF$DATE)
    colnames(BKF)=c("Price")
    BKF<-to.minutes5(BKF)
    BKF<-align.time(BKF,5*60)

    #create a regular time series that has values for each 5 minute interval and use cbind to merge with my data
    tmp<-xts(,seq.POSIXt(start(BKF),end(BKF),by="5 mins"))
    BKF<-cbind(tmp,BKF)

    # subset data from 9:30am to 4:00pm and replace NA's with last observation
    BKF<-BKF["T09:30:00/T16:00:00"]
    BKF<-na.locf(BKF)

    # SP here is daily S&P data for the same sample period
    SP<-xts(order.by=as.Date(td$Date,tz="",format="%y-%m-%d"))

   # Subset observations for all trading days according to the daily S&P data
   test<-bt[as.Date(index(bt),tz="")%in%as.Date(index(td),tz="")]

Done.

Karina
  • 31
  • 1
  • 3
  • 3
    Be careful, you may be filling forward data from the previous day which may not be what you want. I'd probably use `BKF <- do.call(rbind, lapply(split(BKF, "days"), na.locf))` instead of `BKF<-na.locf(BKF)` – GSee Mar 30 '12 at 16:58