14

I have week-date data in the form yyyy-ww where wwis the week number in two digits. The data span 2007-01 to 2010-30. The week counting convention is ISO 8601, which as you can see here on Wikipedia's "Week number" article, occasionally reaches 53 weeks in a year. For example 2009 had 53 weeks by this system, see the week numbers in this ISO 8601 calendar. (See other years; as per the Wikipedia article, 53rd weeks are fairly rare.)

Basically I want to read the week date in, convert it to a Date object and save this to a separate column in a data.frame. As a test, I reconverted the Date objects to yyyy-ww formats by format([Date-object], format = "%Y-%W", and this threw up an error at 2009-53. That week fails to be interpreted as a date by R. This is very odd, as other years which do not have a 53rd week (in ISO 8601 standard) are converted fine, such as 2007-53, whereas other years that also do not have a 53rd week (in ISO 8601 standard) also fail, such as 2008-53

The following minimal example demonstrates the issue.

Minimal example:

dates <- c("2009-50", "2009-51", "2009-52", "2009-53", "2010-01", "2010-02")
as.Date(x = paste(dates, 1), format = "%Y-%W %w")
# [1] "2009-12-14" "2009-12-21" "2009-12-28" NA           "2010-01-04"
# [6] "2010-01-11"

other.dates <- c("2007-53", "2008-53", "2009-53", "2010-53")
as.Date(x = paste(other.dates, 1), format = "%Y-%W %w")
# [1] "2007-12-31" NA           NA           NA     

The question is, how do I get R to accept week numbers in ISO 8601 format?

Note: This question summarises a problem I have been struggling with for a few hours. I have searched and found various helpful posts such as this, but none solved the problem.

Community
  • 1
  • 1
dynamo
  • 2,988
  • 5
  • 27
  • 35
  • 1
    It might be more illustrative to compare `as.Date(x ="2009-01 01", format = "%Y-%W %w")` with `ISOweek2date("2009-W01-1")` and you should also quote the entry for `%W` in `help(strptime)`. – Roland Feb 18 '13 at 13:59
  • Not sure, but I recall that a lot of R's date processing is actually handled by system libraries, which would mean that this sort of issue (a) would vary a lot from OS to OS; (b) might be particularly dodgy on Windows; (c) would be hard to fix in R itself (as seen in the answer below; `ISOweek` implements its own algorithms since stuff is missing from Windows' system libraries) – Ben Bolker Feb 18 '13 at 13:59
  • @BenBolker The behaviour is defined in `help(strptime)`. – Roland Feb 18 '13 at 14:02
  • 1
    Yeah, `ISOweek` relies on `%V`, which is not implemented in Windows. So this really is a Windows problem. As written in the `strptime` help. – dynamo Feb 18 '13 at 14:15

1 Answers1

15

The package ISOweek manages ISO 8601 style week numberings, converting to and from Date objects in R. See ISOweek for more. Continuing the example dates above, we first need to modify the formatting a bit. They must be in form yyyy-Www-w rather than yyyy-ww, i.e. 2009-W53-1. The final digit identifies which day of the week to use in identifying the week, in this case it is the Monday. The week number must be two-digit.

library(ISOweek)

dates <- c("2009-50", "2009-51", "2009-52", "2009-53", "2010-01", "2010-02")
other.dates <- c("2007-53", "2008-53", "2009-53", "2010-53")

dates <- sub("(\\d{4}-)(\\d{2})", "\\1W\\2-1", dates)
other.dates <- sub("(\\d{4}-)(\\d{2})", "\\1W\\2-1", other.dates)

## Check:
dates
# [1] "2009-W50-1" "2009-W51-1" "2009-W52-1" "2009-W53-1" "2010-W01-1"
# [6] "2010-W02-1"

(iso.date <- ISOweek2date(dates))             # deal correctly
# [1] "2009-12-07" "2009-12-14" "2009-12-21" "2009-12-28" "2010-01-04"
# [6] "2010-01-11"
(iso.other.date <- ISOweek2date(other.dates)) # also deals with this
# [1] "2007-12-31" "2008-12-29" "2009-12-28" "2011-01-03"

## Check that back-conversion works:
all(date2ISOweek(iso.date) == dates)
# [1] TRUE

## This does not work for the others, since the 53rd week of
## e.g. 2008 is back-converted to the first week of 2009, in
## line with the ISO 6801 standard.
date2ISOweek(iso.other.date) == other.dates
# [1] FALSE FALSE  TRUE FALSE
Roland
  • 127,288
  • 10
  • 191
  • 288
dynamo
  • 2,988
  • 5
  • 27
  • 35