Firstly, I checked the similar questions and answers but couldn't find an inplace solution for implementing solutions for a huge table. As a toy data, my data structure is like this;
product location date value
<chr> <chr> <date> <int>
1 A X 2022-01-01 22
2 A Y 2022-01-03 82
3 B Y 2022-01-03 27
4 C Z 2022-01-01 74
5 C Z 2022-01-02 58
6 C Z 2022-01-03 68
7 C Z 2022-01-04 75
To reproduce;
df <- structure(list(product = c("A", "A", "B", "C", "C", "C", "C"),
location = c("X", "Y", "Y", "Z", "Z", "Z", "Z"), date = structure(c(18993,
18995, 18995, 18993, 18994, 18995, 18996), class = "Date"),
value = c(22L, 82L, 27L, 74L, 58L, 68L, 75L)), row.names = c(NA,
-7L), class = c("data.table", "data.frame"))
I want to duplicate each product&location combination for a date range (2022-01-01
to 2022-01-04
) and fill value
column with zero.
The desired output should look like;
product location date value
<chr> <chr> <date> <int>
1 A X 2022-01-01 22
2 A X 2022-01-02 0
3 A X 2022-01-03 0
4 A X 2022-01-04 0
5 A Y 2022-01-01 0
6 A Y 2022-01-02 0
7 A Y 2022-01-03 82
8 A Y 2022-01-04 0
9 B Y 2022-01-01 0
10 B Y 2022-01-02 0
11 B Y 2022-01-03 27
12 B Y 2022-01-04 0
13 C Z 2022-01-01 74
14 C Z 2022-01-02 58
15 C Z 2022-01-03 68
16 C Z 2022-01-04 75
I am familiar with the solutions by tidyr::complete
and making distinct tables and cross join them, but my original data is very huge and these solutions are memory inefficient for me. I also tried to split data into smaller parts then fill their missing dates with zero but again in that time I couldn't reunion them because of the memory issue.
How can I fill the missing dates with zero in an inplace way ?
Thanks in advance.