5

Hi I am looking to subset some minutely data by time. I normally use xts doing something like:

subset.string <- 'T10:00/T13:00' 
xts.min.obj[subset.string]

to get all the rows which are between 10am and 1pm (inclusive) EACH DAY and have the output as an xts format. But is a bit slow for my purposes...e.g

j <- xts(rnorm(10e6),Sys.time()-(10e6:1))
system.time(j['T10:00/T16:00'])
   user  system elapsed 
  5.704   0.577  17.115 

I know that data.table is v fast and at subsetting large datasets so am wondering if in conjunction with the fasttime package to deal with fast POSIXct creations, if it would be worth it to create a function like

dt.time.subset <- function(xts.min.obj, subset.string){
  require(data.table)
  require(fasttime)
  x.dt <- data.table(ts=format(index(xts.min.obj),"%Y-%m-%d %H:%M:%S %Z"),
                     coredata(xts.min.obj))
  out <- x.dt[,some.subsetting.operation.using."%between%"]
  xts(out,fastPOSIXct(out[,ts])
}

to convert the xts.min.obj into a data.table add some sort of character index and then use data.table to subset the relevant rows use the output row index with fasttime to recreate an xts output? or is this too many excess operations for something that is already highly optimised and written in C?

h.l.m
  • 13,015
  • 22
  • 82
  • 169
  • Sorry, what's your question? You know about `%between%` so what else do you want? – eddi Jun 27 '13 at 14:48
  • well not too sure how to use %between% with times first of all, and at the moment i am converting the index it into character strings rather than using POSIXct, as I was under the impression subsetting of POSIXct is slow...Plus I dont think it works for each day... – h.l.m Jun 27 '13 at 14:51
  • the each day part is solved by doing `format(..., "%H:%M") %between% c("10:00", "13:00")` but I still am not sure why you need `data.table` here for *just* subsetting - I highly doubt it's useful going back and forth between `xts` and `data.table` if your only purpose is subsetting (but you can always give us example data for which your operations are slow to play around with) – eddi Jun 27 '13 at 15:10
  • 1
    have added dummy example data... – h.l.m Jun 27 '13 at 15:42

1 Answers1

6

If you're ok with specifying your range in UTC, you can do:

j[(.index(j) %% 86400) %between% c(10*3600, 16*3600 + 60)]
# +60 because xts includes that minute; you'll need to offset the times
# appropriately to match with xts unless you live in UTC :)

j <- xts(rnorm(10e6),Sys.time()-(10e6:1))
system.time(j[(.index(j) %% 86400) %between% c(10*3600, 16*3600 + 60)])
#  user  system elapsed 
#  1.17    0.08    1.25 
# likely faster on your machine as mine takes minutes to run the OP bench
eddi
  • 49,088
  • 6
  • 104
  • 155
  • 2
    Use `.index` to access the numeric index directly. – Joshua Ulrich Jun 27 '13 at 16:26
  • I did not realise you could use `%between%` outside of a data.table, also if the original object is not in UTC, it it better to convert the index first, then convert the subsetting times... i.e.if looking to subset between 10am and 4pm which references the original xts object which has a timezone set in Berlin, then i would convert those to the 10am to UTC so is that right? i.e. 8am UTC? then use your method to actually subset, and finally reconvert back to the original timezone? – h.l.m Jun 27 '13 at 17:08
  • @h.l.m I think timezone conversion is going to be quite computationally expensive and I would avoid it if I could – eddi Jun 27 '13 at 17:46
  • how would you adjust the `%between% c(10*3600, 16*3600 + 60)` part to deal with timezones if i shouldn't convert the index timezone to UTC? as my efforts dont seem to be working... – h.l.m Jun 27 '13 at 18:34
  • compare `index(j)[1]` with `as.POSIXlt(index(j)[1], tz = "UTC")` to see the difference; in my case it's "09:00 CDT" and "14:00 UCT", so to do a "T10:00/T16:00" comparison I'd have to adjust those numbers to `c((10+5)*3600, (16+6)*3600 + 60)`. Beware of daylight savings time. – eddi Jun 27 '13 at 18:46
  • Timezone issues and daylight savings issues were annoying but got there in the end! thanks for all your help @eddi ! – h.l.m Jun 28 '13 at 11:02