1

Currently trying to write some that would return the last date from an ordered list that is less than date X.

Right now I have this: it gets a list of days, and gets an index off the day we're going to be doing search on and range of how many dates we want to go back.

After that it checks if the date exists or not (e.g. Feb 30th). If the date doesn't exist, it decreases the date by 1 and then applies filter again (otherwise it tries to subtract 1 day from NA and fails).

library(lubridate)
getDate <- function(dates,day,range){
    if(range == 'single')
        {return (day-1)}

    z <- switch(range,
        single = days(1),
        month = days(30),
        month3 = months(3),
        month6 = months(6),
        year = years(1)
        )

    new_day <-(dates[day]-z)
    i <- 1
    while (is.na(new_day)){
        new_day <- dates[day] - days(i) - z 
    }
    ind<-which.min(abs (diff <-(new_day-dates)))

    if (diff[ind] < 0)
    {ind <- ind -1}

    return (ind[1])
}

While this function works, the problem is the speed efficiency. I have a feeling that which.min(abs()) is far from the quickest and I'm wondering if there are any better alternatives (outside of also writing my own function to search lists).

stocks <- list(structure(list(sec = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), min = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L), mday = c(2L, 3L, 4L, 7L, 8L, 9L, 10L, 11L, 14L, 15L, 16L, 17L,
18L, 22L, 23L, 24L, 25L, 28L, 29L, 30L, 31L, 1L, 4L, 5L, 6L), mon = c(0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L,
1L, 1L, 1L), year = c(108L, 108L, 108L, 108L, 108L, 108L, 108L, 108L, 108L,
108L, 108L, 108L, 108L, 108L, 108L, 108L, 108L, 108L, 108L, 108L, 108L, 108L,
108L, 108L, 108L), wday = c(3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L), yday = c(1L, 2L, 3L, 6L, 7L,
8L, 9L, 10L, 13L, 14L, 15L, 16L, 17L, 21L, 22L, 23L, 24L, 27L, 28L, 29L, 30L,
31L, 34L, 35L, 36L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("sec", "min",
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), tzone = "UTC",
class = c("POSIXlt", "POSIXt")))

old_pos <- getDate(stocks[[1]],21,"month") #should return 0
old_pos <- getDate(stocks[[1]],22,"month") #should return 1

This does not return a vector, nor a date, only an index and the main question isn't about working (which it does), but optimizing it.

The value is later on being used in another function, one possible speed up is to first match all of the old indexes to new ones and then return that as another list. However not sure if it would offer any speed up.

Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
Gray_Hound
  • 127
  • 1
  • 3
  • 12
  • 3
    Please provide some reproducible data for testing. – Roland Jun 21 '13 at 14:48
  • http://pastebin.com/sDXMSft6 or http://pastebin.com/vLVvwjHd (The only value being passed in as stocks is stocks[[1]]) – Gray_Hound Jun 21 '13 at 14:54
  • 2
    Rather than pasting a link to printed data, please provide the output from `dput(head(stocks[[1]],20))` and an example of how we should call your `getDate` function. Then your question would contain a minimal, [reproducible example](http://stackoverflow.com/q/5963269/271616). – Joshua Ulrich Jun 21 '13 at 15:03
  • @Gray_Hound you can be inspired by my answer t create a reproducible example with the expected result. Creating a good example is generally half path to the solution. – agstudy Jun 21 '13 at 15:07
  • http://pastebin.com/tQDwPLip – Gray_Hound Jun 21 '13 at 15:27
  • 3
    I agree with the others who are encouraging you to improve how you ask a question. There is too much irrelevant code and excessive data in the question as it now appears. Its not just a matter of dumping whatever you have onto SO but extracting the meat of it and asking just that. I have answered @agstudy's reformulation of your question in my answer. I suspect not many others would be willing to wade through what you have presented either. – G. Grothendieck Jun 21 '13 at 16:23
  • @G.Grothendieck: I just shortened the data and adjusted the example. – Joshua Ulrich Jun 21 '13 at 16:42

2 Answers2

3

Using @agstudy's reformulation including sDate and x.Date

data.table

We can perform the calculations in data.table like this where the first column shows the original date in sDate and the second column is the corresponding x.Date date:

> library(data.table)
> data.table(date = x.Date, x.Date, key = "date")[J(sDate),, roll = TRUE]
         date     x.Date
1: 2003-02-03 2003-02-02
2: 2003-02-12 2003-02-10
3: 2003-02-16 2003-02-15

sqldf Using sqldf its like this:

> library(sqldf)
> sDateDF <- data.frame(sDate = sDate)
> xDateDF <- data.frame(xDate = x.Date)
> 
> sqldf("select s.sdate sDate, max(x.xdate) xDate 
+   from sDateDF s join xDateDF x on x.xDate <= s.sDate 
+   group by s.sDate")
       sDate      xDate
1 2003-02-03 2003-02-02
2 2003-02-12 2003-02-10
3 2003-02-16 2003-02-15

zoo

Using zoo, we create two zoo series, merge them and use na.locf like this. The result is the x.Date corresponding to each sDate (i.e. the second column in either of the above solutions):

> library(zoo)
>
> zx <- zoo(seq_along(x.Date), x.Date)
> zs <- zoo(seq_along(sDate), sDate)
> x.Date[na.locf(merge(zx, zs))[sDate, "zx"]]
[1] "2003-02-02" "2003-02-10" "2003-02-15"
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

If I understand you have a vector of dates, for example :

x.Date <- as.Date("2003-02-01") + c(1, 3, 7, 9, 14,20) 
"2003-02-02" "2003-02-04" "2003-02-08" "2003-02-10" "2003-02-15" "2003-02-21"

and giving a vector of dates, for example:

sDate <- as.Date("2003-02-01") + c(2,11,15)

You try to get the closer date in x.Date to this giving date but less than this date:

 lapply(sDate,function(x)max(x.Date[x.Date-x <=0]))
[[1]]
[1] "2003-02-02"

[[2]]
[1] "2003-02-10"

[[3]]
[1] "2003-02-15"
agstudy
  • 119,832
  • 17
  • 199
  • 261