1

Man, working with dates is hard!!

I have the following data, and am trying to apply Tidyverse principles and using lubridate for the date components

dates <- data.frame(date = as.Date(c('2017-12-17',
                                     '2017-12-28',
                                     '2018-01-03', 
                                     '2018-01-19')))

I want to define the year, and the week of the year for each of the dates, this is simple enough.

new <- dates %>%
             mutate(c_year = year(date),
                    c_week = week(date))

What I really want to know is the exact date (ymd) for the beginning of the 'c_week' for each 'c_year'. However, by the time I want to calculate the beginning of 'c_week' I do not have the original full date to use floor_date(). So I was poking around the internet and found this little bit of code that works (mostly) fine that calculates the new 's_create_wk' variable, and btw I want Mondays to be the start of the week:

new <- dates %>%
             mutate(c_year = year(date),
                    c_week = week(date),
                    s_create_wk = as.Date(paste0(c_year,c_week, "1"),
                                          "%Y%W%u"))

So theoretically I should get:

        date   c_year   c_week  start_of_wk 
1 2017-12-17     2017       51   2017-12-11
2 2017-12-28     2017       53   2017-12-25
3 2018-01-03     2018        1   2018-01-01
4 2018-01-19     2018        3   2018-01-15

But what I am actually getting is this:

        date   c_year   c_week  start_of_wk 
1 2017-12-17     2017       51   2017-12-18
2 2017-12-28     2017       52   2017-12-25
3 2018-01-03     2018        1           NA
4 2018-01-19     2018        3           NA

Looking at my Outlook calendar, 2017-12-25 should be in week 53, but whether I use 'week()' or 'isoweek()' I get an actual as shown. Which is confusing because week 51 is correct for the original date of 2017-12-17. Additionally, 'start_of_wk' is calculating one week late for 'c_week' - this is way confusing. If I subtract off 7 days I get the right 'start-of-wk' for 'c_week', but this seems like the wrong thing to be doing.

To make a long story short and the biggest problem is that I'm getting NAs for 'start_of_wk' for the 2018 dates and I cannot figure out why!!

Sorry if this is confusing, but this is certainly taking me for a spin. I'm guessing that either the formatting for the date is wrong calculating 's_create_wk', but I have tried many combination (Uu, Vv, Ww), or maybe I'm using the wrong package or the wrong functions to work the date components correctly.

Thanks for the help.

Austin Overman
  • 163
  • 2
  • 9
  • If you detail out the reason its no longer in your data (if you used group_by summarise for example), we might be able to find a solution earlier in the problem? – Matt W. Feb 08 '18 at 01:13
  • Without re-creating all the code, I'm generating 'c_week" immediately following the data load, then later summarizing grouped by 'c_week', followed by a join operation where one of the "by" fields is "c_week'. The calculation of the start of the week for 'c_week' is performed on the resultant joined set, and where the problem of the NAs comes to play. The second edit solution by Calum You below solved my immediate problem with the NAs. – Austin Overman Feb 08 '18 at 16:00
  • gotcha. I'm glad your problem was solved! – Matt W. Feb 08 '18 at 20:15
  • Yes, dates can be a pain. `R` is aware of 3 different definitions of week of the year: The US, UK, and ISO 8601 conventions (see [here](https://stackoverflow.com/a/45587644/3817004) for details). Which one are you using? Also note the first day of the week is defined differently, The US definition take Sundays, while the UK and ISO definitions take Mondays. – Uwe Feb 12 '18 at 14:34

1 Answers1

1

I think this does what you want, though it requires an extra function definition to work in the pipe. lubridate has a wday function that sets the day of the week, and in particular wday(x) <- 1 will change date x to be the date that is the start of that week. Parameter week_start controls what day is considered the start of the week; here I chose Monday to start the week to fit with your desired output.

EDIT: To address the part about row 2 being week 52 rather than 53, by my count 2017-12-28 is in week 52 no matter how you count it. See for example week(ymd("2017-12-30")) which is the 364th day of the year and therefore in week 52 (52 * 7 = 364), while week(ymd("2012-12-31")) returns 53 as expected.

library(tidyverse)
dates <- tibble(date = c('2017-12-17', '2017-12-28', '2018-01-03', '2018-01-19'))

wk_start <- function(date){
  wk_st <- date
  wday(wk_st, week_start = 1) <- 1
  return(wk_st)
}

dates %>%
  mutate(date = ymd(date)) %>%
  mutate(year = year(date),
         week = week(date)
         ) %>%
  mutate(start_of_wk = wk_start(date))
# A tibble: 4 x 4
  date        year  week start_of_wk
  <date>     <dbl> <dbl> <date>     
1 2017-12-17  2017 51.0  2017-12-11 
2 2017-12-28  2017 52.0  2017-12-25 
3 2018-01-03  2018  1.00 2018-01-01 
4 2018-01-19  2018  3.00 2018-01-15 

EDIT2: I poked around and I think the reason your as.Date method doesn't work is because the numbers don't have the right spacing. See the result of just paste0(c_year, c_week, "1"). Instead, this similar approach that inserts spacers does provide the desired output except for the first row, which I am still thinking about:

dates %>%
  mutate(c_year = year(date),
         c_week = week(date),
         s_create_wk = as.Date(str_c(c_year, c_week, "1", sep = "-"), "%Y-%W-%u")
         )
# A tibble: 4 x 4
  date       c_year c_week s_create_wk
  <chr>       <dbl>  <dbl> <date>     
1 2017-12-17   2017  51.0  2017-12-18 
2 2017-12-28   2017  52.0  2017-12-25 
3 2018-01-03   2018   1.00 2018-01-01 
4 2018-01-19   2018   3.00 2018-01-15 
Calum You
  • 14,687
  • 4
  • 23
  • 42
  • Calum You, thanks for helping, your second edit was the solution that worked best/easiest for me, but I like the function solution as well. And I will take that the week of year results are correct. I'm certain that the issue is what calendar standard is used and which day starts the week and it can easily get confusing. Thanks again. – Austin Overman Feb 08 '18 at 16:13
  • ok! I would check in your application whether the fact that it makes Dec 18 start of week for Dec 17 is problematic - that doesn't seem right to me, but the `strptime` format logic is less well documented than `lubridate`. – Calum You Feb 08 '18 at 16:26