2

I am plotting weekly figures that cross over from 2018 into 2019 and the tick marks on my X-axis represent the year then week.

For example:

2018-50, 2018-51, 2018-52, 2018-53, 2019-01, 2019-02, 2019-03

I have two data frames and the dates in either aren't always going to be the same. As such, one solution I have thought of that might work is to find the lowest yearWeek value in either data frame, and the maximum yearWeek value in either data frame, and to then create a sequence using those two values. Note that both values could either exist within a single data frame or one data frame could have the lowest/earliest value and the other the highest/latest value.

Both data frames look like this:

  week yearWeek      month  day       date
1   31  2018-31 2018-08-01  Wed 2018-08-01
2   31  2018-31 2018-08-01  Thu 2018-08-02
3   31  2018-31 2018-08-01  Fri 2018-08-03
4   31  2018-31 2018-08-01  Sat 2018-08-04
5   32  2018-32 2018-08-01  Sun 2018-08-05
6   32  2018-32 2018-08-01  Mon 2018-08-06

I have looked for a solution and this answer is almost there, but not quite.

The problems with this solution are:

  • The single-figure week number don't have a 0 before them; and
  • Despite specifying seq(31:53), for example, the output starts from 1 (I know why this happens); and
  • There doesn't seem to be a way to stop the count at 53 using this method (2018 had a (short) 53rd week which I would like to include) and resume from 2019-01 onwards.

I want to be able to set the X-axis range from 2018-31 (31st week of 2018) to 2019-13 (13th week of 2019).

Something like this:

enter image description here

In short, how can I create a sequence of year-week values ranging from the minimum date value to the maximum date value (in this case 2018-31-2019-13)?

Mus
  • 7,290
  • 24
  • 86
  • 130
  • The edited part is important, you should have included it earlier in the question. Could you post `head` of your dataframe and how does it look? If both of the dataframes are similar you can include any one of it. – Ronak Shah Apr 24 '19 at 10:05
  • Question adapted and head of data frame added (they share the same format / are identical). – Mus Apr 24 '19 at 10:13
  • are you sure that the date spans between 2018 and 2019? or that needs to be identified from the dataframes as well ? – Ronak Shah Apr 24 '19 at 10:17
  • The dates span from 2018 to 2019 for now; however, future datasets that are passed into the script I am building will naturally have different date ranges, meaning that the dates will indeed need to be identified for this to work correctly every time. – Mus Apr 24 '19 at 10:19

4 Answers4

2

I think this would work for you

x1 <- c(31:53)
x2 <- sprintf("%02d", c(1:13))
paste(c(rep(2018, length(x1)), rep(2019, length(x2))), c(x1, x2), sep = "-")

# [1] "2018-31" "2018-32" "2018-33" "2018-34" "2018-35" "2018-36" "2018-37" 
#     "2018-38" "2018-39" "2018-40" "2018-41" "2018-42" "2018-43" "2018-44" 
#     "2018-45" "2018-46" "2018-47" "2018-48" "2018-49" "2018-50" "2018-51" 
#     "2018-52" "2018-53" "2019-01" "2019-02" "2019-03" "2019-04" "2019-05" 
# "2019-06" "2019-07" "2019-08" "2019-09" "2019-10" "2019-11" "2019-12" "2019-13"

For the updated question we can do

#rbind both the dataset
df <- rbind(df1, df2)

#convert them to date
df$Date <- as.Date(df$date)

#Generate a sequence from min date to maximum date, format them 
# to year-week combination and select only the unique ones
unique(format(seq(min(df$Date), max(df$Date), by = "day"), "%Y-%W"))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you, this looks good however I have updated my question with some information that might be helpful. The date range has to span the minimum and maximum dates across two data frames (it doesn't matter if both exist within the same data frame or not. As such this needs to be checked first.). How might I do this? – Mus Apr 24 '19 at 10:06
  • @MusTheDataGuy Updated the answer. See if it helps. – Ronak Shah Apr 24 '19 at 10:55
  • This is *almost* perfect! It works nicely, however there is one output value that sticks out for me: `2019-00`. Any ideas why this might be and/or how to rectify it? I have tried changing the date range and it appears that some years begins with `20XX-00`. – Mus Apr 24 '19 at 11:09
  • @MusTheDataGuy that is how R recognises the week. Check `format(as.Date("2019-01-01"), "%Y-%W")` and `format(as.Date("2018-12-31"), "%Y-%W")` . R recognises it as week 0 and week 53 respectively. If you want it as week 1 then you need to identify those 0's and add 1 to every week number of that year. – Ronak Shah Apr 24 '19 at 11:27
  • Please remember that R recognises three different conventions for numbering the week of the year, two of which include week zero (`%U`, `%W`), and one which doesn't (`%V`). See `?strptime`. – Henrik Apr 24 '19 at 11:29
  • @MusTheDataGuy as commented by @Henrik if you use `unique(format(seq(min(df$Date), max(df$Date), by = "day"), "%Y-%V"))` it should give you the expected output you want. – Ronak Shah Apr 24 '19 at 12:33
0

Define two sequences, and then restrict to the range you want:

years <- c("2018", "2019")
months <- sprintf("%02d", c(1:52))

result <- apply(expand.grid(years, months), 1, function(x) paste(x,collapse="-"))
result <- result[result >= "2018-31" & result <= "2019-13"]
result

 [1] "2019-01" "2019-02" "2019-03" "2019-04" "2019-05" "2019-06" "2019-07"
 [8] "2019-08" "2019-09" "2019-10" "2019-11" "2019-12" "2019-13" "2018-31"
[15] "2018-32" "2018-33" "2018-34" "2018-35" "2018-36" "2018-37" "2018-38"
[22] "2018-39" "2018-40" "2018-41" "2018-42" "2018-43" "2018-44" "2018-45"
[29] "2018-46" "2018-47" "2018-48" "2018-49" "2018-50" "2018-51" "2018-52"

Note that the pruning off of dates we don't want works here even using text date strings, because all dates are fixed width strings, and are left zero padded, if necessary. So, sorting therefore works as it would for actual numbers.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • I see what you have done here, but I need to account for the partial weeks (53rd week) in each year too where relevant. (Also, it seems that your output of `result` doesn't account for the specific weeks - I suspect that `result <- result[result >= "2018-31" & result <= "2019-13"]` wasn't executed.) – Mus Apr 24 '19 at 10:26
  • I don't understand your comment, and my answer in fact generates all year-weeks from `2018-31` to `2019-13` inclusive. If you have some other requirement, then you should have stated that at the very beginning. – Tim Biegeleisen Apr 24 '19 at 10:29
  • If you take a closer look you will see that I *have* stated my requirements - I specifically mention the 53rd week in the third bullet point, which has been there since the very beginning (`(2018 had a (short) 53rd week which I would like to include)`). I now see that I misread the output as the weeks listed are between `2018-31` and `2019-13`. Your code does indeed generate 52 individual weeks for the years 2018 and 2019 - I do not dispute that - but what I require the 53rd week where applicable; the output you have posted does not list it and concludes 2018 at `2018-52`. – Mus Apr 24 '19 at 10:36
  • You are opening a can of worms here. Of course, most years actually have a 53rd week. But, the thing is, you need to define how you want to count your year weeks. Every tool/language handles this in different ways. This requirement greatly complicates your question. – Tim Biegeleisen Apr 24 '19 at 10:38
  • It may well be a can of worms, and I am aware that most years contain a 53rd week and that they aren't full weeks, but nevertheless it is something I have to account for. I have been using `data.table::week()` to extract the week from the `date` value at present - could something like this be implemented in your answer? – Mus Apr 24 '19 at 10:41
0

here is a possibility using the str_pad function from the stringr package:

weeks <- str_pad(41:65 %% 53 + 1, 2, "left", "0")
years <- ifelse(41:65 <= 52, "2018", "2019")
paste(years, weeks, sep = "-")
     [1] "2018-42" "2018-43" "2018-44" "2018-45" "2018-46" "2018-47" "2018-48" "2018-49" "2018-50" "2018-51" "2018-52" "2018-53" "2019-01" "2019-02" "2019-03" "2019-04" "2019-05" "2019-06" "2019-07" "2019-08" "2019-09"
[22] "2019-10" "2019-11" "2019-12" "2019-13"

As I just learned from the other two answers sprintf provides a base alternative to str_pad. So you can also use

weeks <- sprintf("%02d", 41:65 %% 53 + 1)
Cettt
  • 11,460
  • 7
  • 35
  • 58
0

Here is a possibility using strftime:

weeks <- seq(from = ISOdate(2018,12,10), to = ISOdate(2019,4,1), by="week")
strftime(weeks,format="%Y-%W") 
markus
  • 25,843
  • 5
  • 39
  • 58