1

I have some stock return daily data need to turn into weekly format. As you know stock trade only on Monday to Friday, I need to add up each days return to get cumulative weekly return.

I have thought about using lubridate's week function but how does lubridate knows when is the start of the week? How do I make lubridate to recognize the week using weekday function i.e. "Monday" to "Friday" is one week?

I have thought about writing a loop for example: If "Monday" to "Friday" is in the data, then I will call this one week. But for the second week, what shall I use for R to know we are entering second week? Then when we reach year end and we have 52 weeks, how to reset the week count so we are entering the next year?

Here is the dput:

dat = structure(list(date = structure(c(4019, 4022, 4023, 4024, 4025, 
4026, 4029, 4030, 4031, 4032, 4033, 4036, 4037, 4038, 4039, 4040, 
4043, 4044, 4045, 4046, 4047, 4050, 4051, 4052, 4053, 4054, 4057, 
4058, 4059, 4060, 4061, 4065, 4066, 4067, 4068, 4071, 4072, 4073, 
4074, 4075), class = "Date"), weekday = c("Friday", "Monday", 
"Tuesday", "Wednesday", "Thursday", "Friday", "Monday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday", "Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", 
"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Tuesday", 
"Wednesday", "Thursday", "Friday", "Monday", "Tuesday", "Wednesday", 
"Thursday", "Friday"), COMP = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), week = c(1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 
4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 8, 
9, 9), RET = c(-0.005435, 0.040984, -0.015748, -0.021333, 0.002725, 
0.01087, 0.024194, -0.002625, 0.013158, 0.033766, 0, -0.007538, 
-0.005063, 0, -0.002545, 0.015306, 0.017588, -0.007407, 0.024876, 
-0.009709, 0, -0.029412, 0.010101, 0.0075, -0.004963, 0.027431, 
-0.002427, 0.007299, -0.009662, -0.004878, 0.014706, -0.004831, 
0.004854, -0.009662, -0.021951, -0.014963, 0.005063, -0.005038, 
0.010127, 0)), .Names = c("date", "weekday", "COMP", "week", 
"RET"), row.names = c(NA, -40L), class = c("data.table", "data.frame"
))

library(data.table)
setDT(dat)

Here are two month worth of data from 1981-01-02 to 1981-02-27 of company 1's daily return. Lets ignore to calculate the return at the moment and focus on the time first.

week column is generated by weeks() function. As you can see week is not what I wanted, it starts from wednesday and end in wednesday.

weekday is generated by weekdays() function.

I want to make e.g. 1981-01-02 as week 1 (since we have friday here only), 1981-01-05 to 1981-01-09 as week 2 vice versa.

Frank
  • 66,179
  • 8
  • 96
  • 180
Gabriel
  • 423
  • 6
  • 21
  • You should look here on how to make reproducible example and how to better ask a question so we could help you :) https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – DJV Jun 05 '18 at 18:18
  • Hi, added the dput of my dataframe. Thanks – Gabriel Jun 05 '18 at 18:36

4 Answers4

3

Using lubridate you can use isoweek to define the week column.

library(lubridate)
df[, wk := isoweek(date)]

Which gives you

#           date   weekday COMP week       RET wk
#  1: 1981-01-02    Friday    1    1 -0.005435  1
#  2: 1981-01-05    Monday    1    1  0.040984  2
#  3: 1981-01-06   Tuesday    1    1 -0.015748  2
#  4: 1981-01-07 Wednesday    1    1 -0.021333  2
#  5: 1981-01-08  Thursday    1    2  0.002725  2
#  6: 1981-01-09    Friday    1    2  0.010870  2
#  7: 1981-01-12    Monday    1    2  0.024194  3
#  8: 1981-01-13   Tuesday    1    2 -0.002625  3
#  9: 1981-01-14 Wednesday    1    2  0.013158  3
# 10: 1981-01-15  Thursday    1    3  0.033766  3
# 11: 1981-01-16    Friday    1    3  0.000000  3
# 12: 1981-01-19    Monday    1    3 -0.007538  4
# 13: 1981-01-20   Tuesday    1    3 -0.005063  4
# 14: 1981-01-21 Wednesday    1    3  0.000000  4
# 15: 1981-01-22  Thursday    1    4 -0.002545  4
# 16: 1981-01-23    Friday    1    4  0.015306  4
# 17: 1981-01-26    Monday    1    4  0.017588  5
# 18: 1981-01-27   Tuesday    1    4 -0.007407  5
# 19: 1981-01-28 Wednesday    1    4  0.024876  5
# 20: 1981-01-29  Thursday    1    5 -0.009709  5

Using dplyr, you can add the week column with

library(dplyr)
df %>% 
  mutate(wk = isoweek(date))
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
Dosko
  • 46
  • 3
  • You created a new table, which I guess could be combined with the OP's table with a left_join(op_data, your_result)? – Frank Jun 05 '18 at 19:23
  • This seems like the best solution really. OP's data has no Sat/Sun so you could just do `dt[, wk := isoweek(date)]` – IceCreamToucan Jun 05 '18 at 19:25
  • It also has the benefits that (a) data doesn't need to be sorted by date and (b) It still works even if some weeks are missing the Monday row – IceCreamToucan Jun 05 '18 at 19:32
  • Yes, you are right @Frank. I am fairly new to this and was somehow not able to reproduce the OP's dataset. – Dosko Jun 05 '18 at 19:33
  • @Ryan I agree this is the best way, since your answer and mine rely on luck / assumptions (no gaps ... additionally, mine breaks if there are duplicate Mondays, and yours if the data is not sorted.) Dosik, np, it's a weird data.table quirk/trick; I'll edit the OP sot it's copy-pastable; didn't think of it before. – Frank Jun 05 '18 at 19:34
  • Hi, in fact some company have missing observation in some time. i.e. missing return on wednesday. So Frank you mean this will not work right? – Gabriel Jun 05 '18 at 20:14
  • In Stata, they treat date as a integer so for example starting at 1-1-1960 it will be recorded as 1 and 2-1-1960 will be 2 and add up like that. If R can do that we might devise a formula so we can group the days into weeks together – Gabriel Jun 05 '18 at 20:26
  • 1
    @Gabriel the `isoweek` function calculates weeks in the way you're suggesting, and will work even with missing/unordered dates. – IceCreamToucan Jun 06 '18 at 13:45
1

If you want to count Mondays since the start of the data set...

DT[, wk := {
  w = DT[weekday == "Monday"][DT, on=.(date), roll=TRUE, which = TRUE]
  if (anyNA(w)) 
    1L + replace(w, is.na(w), 0L) 
  else 
    w
}]

How it works

We are doing a rolling join of every row of DT onto the subset of DT where weekday == "Monday" rolling to the most recent date in the subset (on = .(date), roll = TRUE) and identifying which row numbers in the subset we land on (which = TRUE).

If the first day is not a Monday, we'll have missing values (for all days before the first Monday) and will want to replace them with ones and to increment all other row numbers by one.


Oh, I guess there's also

DT[, wk := (first(weekday) != "Monday") + cumsum(weekday == "Monday")]

... since the logical condition first(weekday) != "Monday" is 0 if FALSE, 1 if TRUE.

Frank
  • 66,179
  • 8
  • 96
  • 180
  • 1
    I would rather go with something using `round.IDate(date, "weeks")`, but I couldn't understand its output ... maybe because of this currently outstanding issue https://github.com/Rdatatable/data.table/issues/2611 or maybe just because I didn't read the docs carefully enough. – Frank Jun 05 '18 at 19:04
1
dat[, wk := .GRP, cut(date, 'week')]
head(dat, 20)

#           date   weekday COMP week       RET wk
#  1: 1981-01-02    Friday    1    1 -0.005435  1
#  2: 1981-01-05    Monday    1    1  0.040984  2
#  3: 1981-01-06   Tuesday    1    1 -0.015748  2
#  4: 1981-01-07 Wednesday    1    1 -0.021333  2
#  5: 1981-01-08  Thursday    1    2  0.002725  2
#  6: 1981-01-09    Friday    1    2  0.010870  2
#  7: 1981-01-12    Monday    1    2  0.024194  3
#  8: 1981-01-13   Tuesday    1    2 -0.002625  3
#  9: 1981-01-14 Wednesday    1    2  0.013158  3
# 10: 1981-01-15  Thursday    1    3  0.033766  3
# 11: 1981-01-16    Friday    1    3  0.000000  3
# 12: 1981-01-19    Monday    1    3 -0.007538  4
# 13: 1981-01-20   Tuesday    1    3 -0.005063  4
# 14: 1981-01-21 Wednesday    1    3  0.000000  4
# 15: 1981-01-22  Thursday    1    4 -0.002545  4
# 16: 1981-01-23    Friday    1    4  0.015306  4
# 17: 1981-01-26    Monday    1    4  0.017588  5
# 18: 1981-01-27   Tuesday    1    4 -0.007407  5
# 19: 1981-01-28 Wednesday    1    4  0.024876  5
# 20: 1981-01-29  Thursday    1    5 -0.009709  5

Note: This is the same result as dt[, wk := lubridate::isoweek(date)], unless the data isn't ordered by date. In that case, my solution groups the weeks the same way but wk won't be in ascending order. First week might be given a 6, etc.

IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
0

Here's a simpler way (easier to understand I suppose) to solve this:

# if its a monday, mark as 1, 2, 3 and so on
dt[weekday == 'Monday', is_week := seq(.N)]

# forward fill the missing values
library(zoo)
dt[, is_week := na.locf(is_week,na.rm = F, fromLast = F)]
dt[is.na(is_week), is_week := 0]

# find weekly average return
dt[, mean(RET), is_week]

   is_week           V1
1:       0 -0.005435000
2:       1  0.003499600
3:       2  0.013698600
4:       3  0.000032000
5:       4  0.005069600
6:       5  0.002131400
7:       6 -0.002950222
8:       7 -0.000962200
YOLO
  • 20,181
  • 5
  • 20
  • 40