2

I'm trying to set up a new variable that incorporates the difference (in number of days) between a known date and the end of a given year. Dummy data below:

> Date.event <- as.POSIXct(c("12/2/2000","8/2/2001"), format = "%d/%m/%Y", tz = "Europe/London")
> Year = c(2000,2001)
> Dates.test <- data.frame(Date.event,Year)
> Dates.test
  Date.event Year
1 2000-02-12 2000
2 2001-02-08 2001

I've tried applying a function to achieve this, but it returns an error

> Time.dif.fun <- function(x) {
+ as.numeric(as.POSIXct(sprintf('31/12/%s', s= x['Year']),format = "%d/%m/%Y", tz = "Europe/London") - x['Date.event'])
+ }
> Dates.test$Time.dif <- apply(
+ Dates.test, 1, Time.dif.fun
+ )

 Error in unclass(e1) - e2 : non-numeric argument to binary operator 

It seems that apply() does not like as.POSIXct(), as testing a version of the function that only derives the end of year date, it is returned as a numeric in the form '978220800' (e.g. for end of year 2000). Is there any way around this? For the real data the function is a bit more complex, including conditional instances using different variables and sometimes referring to previous rows, which would be very hard to do without apply.

Imran Ali
  • 2,223
  • 2
  • 28
  • 41
  • related question: https://stackoverflow.com/questions/14454476/get-the-difference-between-dates-in-terms-of-weeks-months-quarters-and-years – C8H10N4O2 Aug 22 '17 at 12:39

3 Answers3

2

Here are some alternatives:

1) Your code works with these changes. We factored out s, not because it is necessary, but only because the following line gets very hard to read without that due to its length. Note that if x is a data frame then so is x["Year"] but x[["Year"]] is a vector as is x$Year. Since the operations are all vectorized we do not need apply.

Although we have not made this change, it would be a bit easier to define s as s <- paste0(x$Year, "-12-31") in which case we could omit the format argument in the following line owing to the use of the default format.

Time.dif.fun <- function(x) {
  s <- sprintf('31/12/%s', x[['Year']])
  as.numeric(as.POSIXct(s, format = "%d/%m/%Y", tz = "Europe/London") -x[['Date.event']])
}
Time.dif.fun(Dates.test)
## [1] 323 326

2) Convert to POSIXlt, set the year, month and day to the end of the year and subtract. Note that the year component uses years since 1900 and the mon component uses Jan = 0, Feb = 1, ..., Dec = 11. See ?as.POSIXlt for details on these and other components:

lt <- as.POSIXlt(Dates.test$Date.event)
lt$year <- Dates.test$Year - 1900
lt$mon <- 11
lt$mday <- 31
as.numeric(lt - Dates.test$Date.event)
## [1] 323 326

3) Another possibility is:

with(Dates.test, as.numeric(as.Date(paste0(Year, "-12-31")) - as.Date(Date.event)))
## [1] 323 326
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

You could use the difftime function:

Dates.test$diff_days <- difftime(as.POSIXct(paste0(Dates.test[,2],"-12-31"),format = "%Y-%m-%d", tz = "Europe/London"),Dates.test[,1],unit="days")
count
  • 1,328
  • 9
  • 16
0

You can use ISOdate to build the end of year date, and the difftime(... units='days') to get the days til end of year.

From ?difftime:

Limited arithmetic is available on "difftime" objects: they can be added or subtracted, and multiplied or divided by a numeric vector.

If you want to do more than the limited arithmetic, just coerce with as.numeric(), but you will have to stick with whatever units you specified.

By convention, you may wish to use the beginning of the next year (midnight on new year's eve) as your endpoint for that year. For example:

Dates.test <- data.frame(
  Date.event = as.POSIXct(c("12/2/2000","8/2/2001"), 
                          format = "%d/%m/%Y", tz = "Europe/London")
)
# use data.table::year() to get the year of a date
year <- function(x) as.POSIXlt(x)$year + 1900L
Dates.test$Date.end <- ISOdate(year(Dates.test$Date.event)+1,1,1)

# if you don't want class 'difftime', wrap it in as.numeric(), as in:
Dates.test$Date.diff <- as.numeric(
                             difftime(Dates.test$Date.end, 
                                      Dates.test$Date.event, 
                                      units='days')
                        )
Dates.test
#   Date.event            Date.end Date.diff
# 1 2000-02-12 2001-01-01 12:00:00     324.5
# 2 2001-02-08 2002-01-01 12:00:00     327.5

The apply() family are basically a clean way of doing for loops, and you should strive for more efficient, vectorized solutions.

C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
  • Some useful tips, thanks. Is year() a function of an extra package? It was not recognised for me. The problem with vectorised solutions is that the real function has more than one potential known date, and sometimes it needs to go back to the previous year to look for additional dates, so there are a few ifelse() functions I want to carry out. The 'date difference' as above is also not the end point of the output, I need to convert the number of days to a numeric in order to scale by extra coefficients. – user8500124 Aug 22 '17 at 12:59
  • whoops, `year` is actually in `data.table`, but instead of loading the package you can just define it as edited above. – C8H10N4O2 Aug 22 '17 at 13:40
  • @user See edits. `ifelse()` is vectorized so I'm not sure what you're getting at. This solution works on the example you provided. Please be sure that your examples are representative of the question you are trying to ask. – C8H10N4O2 Aug 22 '17 at 13:46