0

I have a dataset like so

data <- data.table::data.table(id = seq(1:5),
                               t0_year = c(2000,2001,2002,2001,2000),
                               t0_week = c(4,32,44,15,11),
                               t1_date = as.Date(c("2010-07-31", "2010-12-31", "2010-02-08", "2005-07-05", "2000-07-20")))

I would like to calculate the time difference in weeks between t1 and t0, but for t0, I only have the year and week number, while for t1, I have the exact date.

I have tried different things with lubridate, but without much luck. Does anyone have a solution?

epiNS
  • 353
  • 2
  • 7
  • One *cannot* calculate the difference in a fixed unit between a *date* and a *week*, they are different. The best one might do is determine a range, where the difference might be a number plus-or-minus-3.5 to indicate that you don't know which of the values is correct. If your "week" is assumed to start on a Sunday or a Monday or whichever date you choose, then ... change `t0` to a `Date` (permanently, as a new column, or on-the-fly) and do a *real* difference calculation using `-` or `difftime`. – r2evans Oct 19 '21 at 12:24
  • 1
    As a coarse example, what is (precisely) the difference between `2021-10-19` and `2021-Jan`? I've generalized this to be a month and not a week, but the point is the same. The value is somewhere between 261 and 291 (inclusive), and given what we know it could be any of those with equal probability. If we *assume* that somebody is most likely to want the difference from the first or last day, then it is *one of* 261 or 291, but to know which we need to know intent and/or context of the original question. – r2evans Oct 19 '21 at 12:27

2 Answers2

3

One cannot calculate the difference in a fixed unit between a date and a week, they are different. The best one might do is determine a range, where the difference might be a number plus-or-minus-3.5 to indicate that you don't know which of the values is correct. If your "week" is assumed to start on a Sunday or a Monday or whichever day you choose, then ... change t0 to a Date (permanently, as a new column, or on-the-fly) and do a real difference calculation using - or difftime.

As a coarse example, what is (precisely) the difference between 2021-10-19 and 2021-Jan? I've generalized this to be a month and not a week, but the point is the same. The value is somewhere between 261 and 291 (inclusive), and given what we know it could be any of those with equal probability. If we assume that somebody is most likely to want the difference from the first or last day of said month, then it is one of 261 or 291, but to know which we need to know intent and/or context of the original question.

For the discussion of "how to convert year-month into a Date object", see Transform year/week to date object.

Here are two options, depending on Sunday- or Monday-based weeks. (I chose those two days because those appear to be the most frequent week-start-days; however, whichever you use is up to you and the context or your data, I make no assertion that these are better than others.)

library(data.table)
data[, t0_sun := as.Date(sprintf("%4i%02i7", t0_year, t0_week), format = "%Y%U%u")
  ][, t0_mon := as.Date(sprintf("%4i%02i1", t0_year, t0_week), format = "%Y%U%u")
  ][, d := as.numeric(t1_date - t0_sun, "weeks")]
#       id t0_year t0_week    t1_date     t0_mon     t0_sun         d
#    <int>   <num>   <num>     <Date>     <Date>     <Date>     <num>
# 1:     1    2000       4 2010-07-31 2000-01-24 2000-01-23 548.85714
# 2:     2    2001      32 2010-12-31 2001-08-13 2001-08-12 489.71429
# 3:     3    2002      44 2010-02-08 2002-11-04 2002-11-03 379.14286
# 4:     4    2001      15 2005-07-05 2001-04-16 2001-04-15 220.28571
# 5:     5    2000      11 2000-07-20 2000-03-13 2000-03-12  18.57143

(I calculated both t0_sun and t0_mon just to demonstrate both Sunday and Monday calcs, note the different digit appended to the string. You only need one.)

Note that from ?strptime (and the linked question/answer about year-week to Date), "%U" is using a US-based standard; one might instead choose "%W" for UK-based weeks.

data[, t0_sun := as.Date(sprintf("%4i%02i0", t0_year, t0_week), format = "%Y%W%w")
  ][, t0_mon := as.Date(sprintf("%4i%02i1", t0_year, t0_week), format = "%Y%W%2")
  ][, d := as.numeric(t1_date - t0_sun, "weeks")]
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • This makes sense. What if t1 also was in year and week, eg. data <- data.table::data.table(id = seq(1:5), t0_year = c(1994,1996,1992,1990,1999), t0_week = c(4,32,44,15,11), t1_year = c(2006,2007,2005,2010,2016), t1_week = c(7,38,40,12,6)) Could we calculate the difference in weeks then, without first creating an "artificial" date object? – epiNS Oct 19 '21 at 19:27
  • 1
    If `t1` and `t2` are both in year/week format, *and* one assumes the same day-of-week for each, then I think you *still* need to use `as.Date` in the same way as above (using `"%U"` or `"%W"`, your choice). (Granted, it is certainly feasible to work around it by trying modulus math with year and weeks-per-year and such, and ... it involves enough exceptions to the rule that it's easy to end up being inaccurate at some point. Really, use libraries that know the rules and are tried-and-tested. Convert to `Date`, use simple subtraction, and move on to the next big problem :-) – r2evans Oct 19 '21 at 19:30
  • 1
    Granted, you do not need to *store* the interim `Date` object, you can do `difftime(as.Date(sprintf(...)), as.Date(sprintf(...)), units="weeks")` or similar. But the interim objects still should be created. – r2evans Oct 19 '21 at 19:31
0

Does this work:

library(dplyr)
library(lubridate)

data %>% mutate(t0_date = as.Date(paste(t0_year, round(day(weeks(t0_week))/30), '01', sep = '-'), format = '%Y-%m-%d'), diff_weeks = difftime(t1_date,t0_date, units = 'weeks'))
   id t0_year t0_week    t1_date    t0_date      diff_weeks
1:  1    2000       4 2010-07-31 2000-01-01 552.00000 weeks
2:  2    2001      32 2010-12-31 2001-07-01 495.71429 weeks
3:  3    2002      44 2010-02-08 2002-10-01 383.85714 weeks
4:  4    2001      15 2005-07-05 2001-04-01 222.28571 weeks
5:  5    2000      11 2000-07-20 2000-03-01  20.14286 weeks
Karthik S
  • 11,348
  • 2
  • 11
  • 25