2

This is directly related to my question POSIX date from dates in weekly time format.

However, in this question I'd like to specifically ask for how to map ISO 8601 week numbers to month of the year numbers.

To me, it seems it is not possible and/or involves some non-intuitive hacks (and even these don't really work reliably) and IMO should thus be considered as something that needs to be fixed in base R. Please correct me if I'm wrong, though

EDIT: seems like it the issue is closely related to either running on Windows and/or the locale you're on (standard German, in my case)

posix <- as.POSIXct(c("2015-12-24", "2015-12-31", "2016-01-01", "2016-01-08"))

ISO 8601

(yw <- format(posix, "%Y-%V"))
# [1] "2015-52" "2015-53" "2016-53" "2016-01"
ywd <- sprintf("%s-1", yw)
(as.POSIXct(ywd, format = "%Y-%V-%u"))
# [1] "2015-01-12 CET" "2015-01-12 CET" "2016-01-12 CET" "2016-01-12 CET"
# -> utterly wrong!!!

ywd <- sprintf("%s-4", yw)
(as.POSIXct(ywd, format = "%Y-%V-%u"))
# -> still wrong -> the day of the week is not the reason

# -> no way to use ISO 8601 convention to map week of the year to month of the year

For the sake of due dilligence: it's also not possible when trying to use the US or UK conventions:

US convention

(yw <- format(posix, "%Y-%U"))
# [1] "2015-51" "2015-52" "2016-00" "2016-01"
ywd <- sprintf("%s-1", yw)
(as.POSIXct(ywd, format = "%Y-%U-%u"))
# [1] "2015-12-21 CET" "2015-12-28 CET" NA               "2016-01-04 CET"
# -> NA problem for week 00

ywd <- sprintf("%s-4", yw)
# -> does not work for week 00
(as.POSIXct(ywd, format = "%Y-%U-%u"))
# The day of the week is not the reason

# -> no way to use this convention to reliably map week of the year to month of the year

UK convention

(yw <- format(posix, "%Y-%W"))
# [1] "2015-51" "2015-52" "2016-00" "2016-01"
ywd <- sprintf("%s-1", yw)
(as.POSIXct(ywd, format = "%Y-%W-%u"))
# [1] "2015-12-21 CET" "2015-12-28 CET" NA               "2016-01-04 CET"
# -> NA problem for week 00

ywd <- sprintf("%s-4", yw)
# -> does not work for week 00
(as.POSIXct(ywd, format = "%Y-%W-%u"))
# The day of the week is not the reason

# -> no way to use this convention to reliably map week of the year to month of the year

Session info

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=German_Germany.1252     LC_CTYPE=German_Germany.1252       LC_MONETARY=German_Germany.1252   
[4] LC_NUMERIC=C                       LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] fva_0.1.0       digest_0.6.10   readxl_0.1.1    dplyr_0.5.0     plyr_1.8.4      magrittr_1.5   
 [7] memoise_1.0.0   testthat_1.0.2  roxygen2_5.0.1  devtools_1.12.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8     lubridate_1.6.0 assertthat_0.1  packrat_0.4.8-1 crayon_1.3.2    withr_1.0.2    
 [7] R6_2.2.0        DBI_0.5-1       stringi_1.1.2   rstudioapi_0.6  tools_3.3.2     stringr_1.1.0  
[13] tibble_1.2     

> devtools::session_info()
Session info -----------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.3.2 (2016-10-31)
 system   x86_64, mingw32             
 ui       RStudio (1.0.136)           
 language en                          
 collate  German_Germany.1252         
 tz       Europe/Berlin               
 date     2017-01-12                  

Packages ---------------------------------------------------------------------------------------------------
 package    * version date       source        
 assertthat   0.1     2013-12-06 CRAN (R 3.3.2)
 crayon       1.3.2   2016-06-28 CRAN (R 3.3.2)
 DBI          0.5-1   2016-09-10 CRAN (R 3.3.2)
 devtools   * 1.12.0  2016-06-24 CRAN (R 3.3.2)
 digest     * 0.6.10  2016-08-02 CRAN (R 3.3.2)
 dplyr      * 0.5.0   2016-06-24 CRAN (R 3.3.2)
 fva        * 0.1.0   <NA>       local         
 lubridate    1.6.0   2016-09-13 CRAN (R 3.3.2)
 magrittr   * 1.5     2014-11-22 CRAN (R 3.3.2)
 memoise    * 1.0.0   2016-01-29 CRAN (R 3.3.2)
 packrat      0.4.8-1 2016-09-07 CRAN (R 3.3.2)
 plyr       * 1.8.4   2016-06-08 CRAN (R 3.3.2)
 R6           2.2.0   2016-10-05 CRAN (R 3.3.2)
 Rcpp         0.12.8  2016-11-17 CRAN (R 3.3.2)
 readxl     * 0.1.1   2016-03-28 CRAN (R 3.3.2)
 roxygen2   * 5.0.1   2015-11-11 CRAN (R 3.3.2)
 stringi      1.1.2   2016-10-01 CRAN (R 3.3.2)
 stringr      1.1.0   2016-08-19 CRAN (R 3.3.2)
 testthat   * 1.0.2   2016-04-23 CRAN (R 3.3.2)
 tibble       1.2     2016-08-26 CRAN (R 3.3.2)
 withr        1.0.2   2016-06-20 CRAN (R 3.3.2)
Community
  • 1
  • 1
Rappster
  • 12,762
  • 7
  • 71
  • 120

3 Answers3

2

Disclosure: As mentioned in this answer I have created the ISOweek package to deal with ISO 8601 week-based dates.

The question contains several flaws:

  1. The ISO 8601 week-based year is different from the calendar year.
  2. Without specifing a day of week, the conversion of year-week to year-month is ambiguous.

Week-based year vs calendar year

The OP has created sample data using

posix <- as.POSIXct(c("2015-12-24", "2015-12-31", "2016-01-01", "2016-01-08"))
(yw <- format(posix, "%Y-%V"))
[1] "2015-52" "2015-53" "2016-53" "2016-01"

The format specification %Y returns the calendar year which apparently is wrong for the third element.

With the correct format specification %G we do get

(yw <- format(posix, "%G-%V"))
[1] "2015-52" "2015-53" "2015-53" "2016-01"

Conversion of week-of-the-year to month-of-the-year

Just providing the ISO week-based year and week number without the day of week will yield ambiguous results.

This can be demonstrated with the (corrected) sample data which now contain three consecutive weeks in the OP's own (non-standard) year-week format:

yw
[1] "2015-52" "2015-53" "2016-01"

With help of the ISOweek2date() function from the ISOweek package the data are converted to calendar dates. Note that ISOweek2date() requires a full ISO 8601 week-based date in the format yyyy-Www-d including the day of week. If we choose the first day of the week (Monday) we do get:

library(ISOweek)
library(magrittr)
yw %>% 
  # insert "W" to conform with ISO 8601 format
  sub("-", "-W", .) %>% 
  # append day of week
  paste0("-1") %>%
  # convert to class Date and print as yyyy-mm 
  ISOweek2date() %>% 
  format("%Y-%m")
[1] "2015-12" "2015-12" "2016-01"

Now, we repeat this using the last day of the week (Sunday):

yw %>% 
  sub("-", "-W", .) %>% 
  paste0("-7") %>% 
  ISOweek2date() %>% 
  format("%Y-%m")
[1] "2015-12" "2016-01" "2016-01"

Note that the second element now refers to January 2016 instead of December 2015 because the Sunday of week 53 is in January and the Monday of this week still is in December.

Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
  • Thanks @Uwe for this, your answer and your package are very helpful (particularly given the failure of lubridate, readr, or base::strptime to address ISO week). However, it does return this ambiguous format: ISOweek::ISOweek("2016-01-02") is "2015-W53" with no day. – cboettig Dec 01 '20 at 22:29
1

The documentation for R datetime format parameters ?strptime says "%V" will be ignored on input.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Unfortunately, also the other ISO 8601 week-based format specifiers `%g` and `%G` (week-based year) are being ignored on input. – Uwe Aug 07 '17 at 09:54
  • 1
    Yes, and that is also exactly as is documented in `?strptime`. Please read the help pages. – IRTFM Aug 07 '17 at 21:54
0

Pretty sure something else besides base R needs changing (see note at end tho):

some_dates <- as.POSIXct(c("2015-12-24", "2015-12-31", "2016-01-01", "2016-01-08"))

(year_week <- format(some_dates, "%Y %U"))
## [1] "2015 51" "2015 52" "2016 00" "2016 01"

(year_week_day <- sprintf("%s 1", year_week))
## [1] "2015 51 1" "2015 52 1" "2016 00 1" "2016 01 1"

(as.POSIXct(year_week_day, format = "%Y %U %u"))
## [1] "2015-12-21 EST" "2015-12-28 EST" "2016-01-04 EST" "2016-01-04 EST"

It works with the dashes, too:

(year_week <- format(some_dates, "%Y-%U"))
## [1] "2015-51" "2015-52" "2016-00" "2016-01"

(year_week_day <- sprintf("%s-1", year_week))
## [1] "2015-51-1" "2015-52-1" "2016-00-1" "2016-01-1"

(as.POSIXct(year_week_day, format = "%Y-%U-%u"))
## [1] "2015-12-21 EST" "2015-12-28 EST" "2016-01-04 EST" "2016-01-04 EST"

and, despite dashes being OK ISO form, they can lead to confusion in readers when various values aren't >12 or <0

NOTE

As the comment thread indicates this is the behaviour on Windows:

(year_week <- format(some_dates, "%Y-%U"))
## [1] "2015-51" "2015-52" "2016-00" "2016-01"

(year_week_day <- sprintf("%s-1", year_week))
## [1] "2015-51-1" "2015-52-1" "2016-00-1" "2016-01-1"

(as.POSIXct(year_week_day, format = "%Y-%U-%u"))
## [1] "2015-12-21 PST" "2015-12-28 PST" NA               "2016-01-04 PST"

(Windows 10 64bit, R 3.3.2 for me/this example)

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • Thanks for taking the time. The last line of your code still gives me `[1] "2015-12-21 CET" "2015-12-28 CET" NA "2016-01-04 CET"` on my machine. So it seems to be related to a combination of OS, locale and R. – Rappster Jan 12 '17 at 15:42
  • this — http://rpubs.com/hrbrmstr/241271 — is the knitted code/results (i.e. completely fresh R session) in it's entirety. it runs perfectly fine on my macOS system and Ubuntu system. – hrbrmstr Jan 12 '17 at 15:44
  • As I said: seems like OS does play a role here – Rappster Jan 12 '17 at 15:45
  • 1
    ugh. you had to make me fire up a Windows 10 VM didn't you ;-) Indeed, I get the `NA` result there. Prbly best to post this example to the R mailing list. I checked R's bugzilla and this particular instance doesn't come up but I know there have been inconsistencies in `strptime` on cygwin (well, inconsistent in the sense that it doesn't do some things under the covers linux- & macOS-proper —i.e. posix "std"—do). If I were part of the R core team I'm not sure I'd be inclined to fix this on the base R-side, tho. – hrbrmstr Jan 12 '17 at 15:52
  • 1
    Possibly give this a shot: http://gallery.rcpp.org/articles/parsing-datetimes/ as it will use Boost vs native libs. – hrbrmstr Jan 12 '17 at 15:56
  • So my bold part in my question wasn't TOTALLY out of place ;-) Thanks for explicitly co-checking. My experiences with proposing stuff to the R core team haven't been too pleasant in the past as I have the feeling they're only listening to the "big cats" out there. But I'll give it another shot :-) And thanks for the pointer to the Rcpp workaround – Rappster Jan 12 '17 at 15:58
  • Well, as I said, I kinda disagree it's a base R problem. A workaround at the base R level for Windows would likely be slow. I would agree that any discrepancies should be documented better (if there are). If you post to R help I'll try to find the msg and reinforce the behaviour and see if they can get it up on bugzilla vs ignore it. – hrbrmstr Jan 12 '17 at 16:02
  • I just posted on r-help (title "Match ISO 8601 week-of-year numbers to month-of-year numbers on Windows with German locale") – Rappster Jan 12 '17 at 16:14
  • IMHO, the issue partly was caused by using the wrong year specification `%Y` which refers to the calendar year instead of `%G` which refers to the week-based year. – Uwe Aug 07 '17 at 09:51