Here is a data.table
solution which uses a "grouped shift()
" and multiple joins to copy value
from the same hour
of the PreviousDaySameDayCategory
.
Create reproducible data
OP's code to create reproducible data was not fully reproducible because he used the weekdays()
function which returns the weekday names in the current locale (which seems to be Spanish for the OP). To be independent of the current locale, I switched to format(Date, "%u")
which returns the numbers 1 to 7 for Monday to Sunday. Furthermore, the fct_collapse()
from the forcats
package is used to collapse the days 2, 3, and 4 (Tuesday to Thursday) into one factor level.
library(data.table)
# note that package lubridate is not required
myDT <- data.table(Date.POSIXct = seq(as.POSIXct("2018-05-01"),
as.POSIXct("2018-05-31"), "hour"))
myDT[, Date := as.Date(Date.POSIXct)]
myDT[, Weekday := format(Date, "%u")]
myDT[, DayCategory := forcats::fct_collapse(Weekday, "234" = c("2", "3", "4"))]
myDT[, hour := hour(Date.POSIXct)]
myDT[, Holidays := c(rep(0, 24 * 7), rep(1, 24 * 7), rep(0, 24 * 16 + 1))]
set.seed(123)
myDT[, myvalue := sample.int(101, size = nrow(mydf), replace = TRUE)]
Intermediate step: PreviousDaySameDayCategory
The sample data set consists of hourly data but in order to determine the PreviousDaySameDayCategory
we need to work day-wise and thus only have to deal with the unique values of Date
, DayCategory
, and Holidays
. The data is grouped by DayCategory
and the Holidays
indicator. For each group separately, the previous day is picked by lagging Date
. As the result of shift()
operations depend on the order of rows the dataset has been ordered before shifting.
tmp <- unique(myDT[order(Date), .(Date, DayCategory, Holidays)])[
, .(Date, PreviousDaySameDayCategory = shift(Date)), by = .(DayCategory, Holidays)][
order(Date)]
tmp
DayCategory Holidays Date PreviousDaySameDayCategory
1: 234 0 2018-05-01 <NA>
2: 234 0 2018-05-02 2018-05-01
3: 234 0 2018-05-03 2018-05-02
4: 5 0 2018-05-04 <NA>
5: 6 0 2018-05-05 <NA>
6: 7 0 2018-05-06 <NA>
7: 1 0 2018-05-07 <NA>
8: 234 1 2018-05-08 <NA>
9: 234 1 2018-05-09 2018-05-08
10: 234 1 2018-05-10 2018-05-09
11: 5 1 2018-05-11 <NA>
12: 6 1 2018-05-12 <NA>
13: 7 1 2018-05-13 <NA>
14: 1 1 2018-05-14 <NA>
15: 234 0 2018-05-15 2018-05-03
16: 234 0 2018-05-16 2018-05-15
17: 234 0 2018-05-17 2018-05-16
18: 5 0 2018-05-18 2018-05-04
19: 6 0 2018-05-19 2018-05-05
20: 7 0 2018-05-20 2018-05-06
21: 1 0 2018-05-21 2018-05-07
22: 234 0 2018-05-22 2018-05-17
23: 234 0 2018-05-23 2018-05-22
24: 234 0 2018-05-24 2018-05-23
25: 5 0 2018-05-25 2018-05-18
26: 6 0 2018-05-26 2018-05-19
27: 7 0 2018-05-27 2018-05-20
28: 1 0 2018-05-28 2018-05-21
29: 234 0 2018-05-29 2018-05-24
30: 234 0 2018-05-30 2018-05-29
31: 234 0 2018-05-31 2018-05-30
DayCategory Holidays Date PreviousDaySameDayCategory
For days 3 and 4 (Wednesdays and Thursday) the preceeding Tuesday and Wednesday, resp., of the same week are picked. For day 2 (Tuesday) the preceeding Thursday of the preceeding week is picked if both weeks have the same holiday indicator set. If the preceeding week has a different holiday indicator the most recent Thursday of the same holiday period is picked. This is why, e.g., the 2018-05-03 is picked in row 15.
Copying value
from matching PreviousDaySameDayCategory
This is done in two steps. First, the hourly values are picked from the matching PreviousDaySameDayCategory
by joining with the matching days table tmp
:
tmp2 <- myDT[tmp, on = .(Date = PreviousDaySameDayCategory), .(Date = i.Date, hour, myvalue), nomatch = 0L]
tmp2
Date hour myvalue
1: 2018-05-02 0 30
2: 2018-05-02 1 80
3: 2018-05-02 2 42
4: 2018-05-02 3 90
5: 2018-05-02 4 95
---
500: 2018-05-31 19 39
501: 2018-05-31 20 1
502: 2018-05-31 21 1
503: 2018-05-31 22 101
504: 2018-05-31 23 11
Second, a new column previousValue
in myDT
is created by updating in a join which contains the corresponding value from PreviousDaySameDayCategory
:
myDT[tmp2, on = .(Date, hour), previousValue := i.myvalue]
Here, the first two days of the result are shown:
myDT[Date %between% c(as.Date("2018-05-01"), as.Date("2018-05-02"))]
Date.POSIXct Date Weekday DayCategory hour Holidays myvalue previousValue
1: 2018-05-01 00:00:00 2018-05-01 2 234 0 0 30 NA
2: 2018-05-01 01:00:00 2018-05-01 2 234 1 0 80 NA
3: 2018-05-01 02:00:00 2018-05-01 2 234 2 0 42 NA
4: 2018-05-01 03:00:00 2018-05-01 2 234 3 0 90 NA
5: 2018-05-01 04:00:00 2018-05-01 2 234 4 0 95 NA
6: 2018-05-01 05:00:00 2018-05-01 2 234 5 0 5 NA
7: 2018-05-01 06:00:00 2018-05-01 2 234 6 0 54 NA
8: 2018-05-01 07:00:00 2018-05-01 2 234 7 0 91 NA
9: 2018-05-01 08:00:00 2018-05-01 2 234 8 0 56 NA
10: 2018-05-01 09:00:00 2018-05-01 2 234 9 0 47 NA
11: 2018-05-01 10:00:00 2018-05-01 2 234 10 0 97 NA
12: 2018-05-01 11:00:00 2018-05-01 2 234 11 0 46 NA
13: 2018-05-01 12:00:00 2018-05-01 2 234 12 0 69 NA
14: 2018-05-01 13:00:00 2018-05-01 2 234 13 0 58 NA
15: 2018-05-01 14:00:00 2018-05-01 2 234 14 0 11 NA
16: 2018-05-01 15:00:00 2018-05-01 2 234 15 0 91 NA
17: 2018-05-01 16:00:00 2018-05-01 2 234 16 0 25 NA
18: 2018-05-01 17:00:00 2018-05-01 2 234 17 0 5 NA
19: 2018-05-01 18:00:00 2018-05-01 2 234 18 0 34 NA
20: 2018-05-01 19:00:00 2018-05-01 2 234 19 0 97 NA
21: 2018-05-01 20:00:00 2018-05-01 2 234 20 0 90 NA
22: 2018-05-01 21:00:00 2018-05-01 2 234 21 0 70 NA
23: 2018-05-01 22:00:00 2018-05-01 2 234 22 0 65 NA
24: 2018-05-01 23:00:00 2018-05-01 2 234 23 0 101 NA
25: 2018-05-02 00:00:00 2018-05-02 3 234 0 0 67 30
26: 2018-05-02 01:00:00 2018-05-02 3 234 1 0 72 80
27: 2018-05-02 02:00:00 2018-05-02 3 234 2 0 55 42
28: 2018-05-02 03:00:00 2018-05-02 3 234 3 0 61 90
29: 2018-05-02 04:00:00 2018-05-02 3 234 4 0 30 95
30: 2018-05-02 05:00:00 2018-05-02 3 234 5 0 15 5
31: 2018-05-02 06:00:00 2018-05-02 3 234 6 0 98 54
32: 2018-05-02 07:00:00 2018-05-02 3 234 7 0 92 91
33: 2018-05-02 08:00:00 2018-05-02 3 234 8 0 70 56
34: 2018-05-02 09:00:00 2018-05-02 3 234 9 0 81 47
35: 2018-05-02 10:00:00 2018-05-02 3 234 10 0 3 97
36: 2018-05-02 11:00:00 2018-05-02 3 234 11 0 49 46
37: 2018-05-02 12:00:00 2018-05-02 3 234 12 0 77 69
38: 2018-05-02 13:00:00 2018-05-02 3 234 13 0 22 58
39: 2018-05-02 14:00:00 2018-05-02 3 234 14 0 33 11
40: 2018-05-02 15:00:00 2018-05-02 3 234 15 0 24 91
41: 2018-05-02 16:00:00 2018-05-02 3 234 16 0 15 25
42: 2018-05-02 17:00:00 2018-05-02 3 234 17 0 42 5
43: 2018-05-02 18:00:00 2018-05-02 3 234 18 0 42 34
44: 2018-05-02 19:00:00 2018-05-02 3 234 19 0 38 97
45: 2018-05-02 20:00:00 2018-05-02 3 234 20 0 16 90
46: 2018-05-02 21:00:00 2018-05-02 3 234 21 0 15 70
47: 2018-05-02 22:00:00 2018-05-02 3 234 22 0 24 65
48: 2018-05-02 23:00:00 2018-05-02 3 234 23 0 48 101
Date.POSIXct Date Weekday DayCategory hour Holidays myvalue previousValue
Verification
The result is in line with OP's expectations
identical(myDT[, previousValue[seq_along(solution)]], solution)
[1] TRUE