Expanding the data.table using R

Question

I have a data.table called temp_dt given below. In this table, the column spots has some unique values for each date column. I want to expand this table so that all spot values are available for each of these date rows (grouped by column type).

One of the solutions might be to dcast and then melt the data.table but that is too slow as my original data.table is too large. Can there be a faster solution?

After expanding, there should be 84 rows in this table as there are 21 unique spots and 4 unique type.

temp_dt = structure(list(date = structure(c(18318L, 18318L, 18318L, 18318L, 
                                            18318L, 18318L, 18318L, 18318L, 18318L, 18318L, 18318L, 18318L, 
                                            18318L, 18318L, 18318L, 18318L, 18318L, 18318L, 18318L, 18318L, 
                                            18318L, 18318L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 
                                            18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 
                                            18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 
                                            18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 
                                            18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L), 
                                          class = c("IDate", "Date")), 
                         spots = c(35, 40, 45, 50, 52.5, 55, 57.5, 60, 65, 70, 
                                   75, 30, 35, 40, 45, 50, 52.5, 55, 57.5, 60, 65, 70, 25, 27.5, 
                                   30, 32.5, 35, 37.5, 40, 42.5, 45, 47.5, 50, 52.5, 55, 57.5, 60, 
                                   70, 75, 15, 20, 22.5, 25, 27.5, 30, 32.5, 35, 37.5, 40, 42.5, 
                                   45, 47.5, 50, 52.5, 55, 57.5, 60, 65, 70), 
                         price = c(56.57, 56.57, 
                                   56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 
                                   56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 
                                   56.57, 56.57, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 
                                   39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 
                                   39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 
                                   39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 
                                   39.65, 39.65, 39.65), 
                         type = c("a", "a", "a", "a", "a", "a", 
                                  "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", 
                                  "b", "b", "b", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", 
                                  "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", 
                                  "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", 
                                  "b")), row.names = c(NA, -59L), class = c("data.table", "data.frame"
                                  ))
> temp_dt
          date spots price type
 1: 2020-02-26  35.0 56.57    a
 2: 2020-02-26  40.0 56.57    a
 3: 2020-02-26  45.0 56.57    a
 4: 2020-02-26  50.0 56.57    a
 5: 2020-02-26  52.5 56.57    a
 6: 2020-02-26  55.0 56.57    a
 7: 2020-02-26  57.5 56.57    a
 8: 2020-02-26  60.0 56.57    a
 9: 2020-02-26  65.0 56.57    a
10: 2020-02-26  70.0 56.57    a
11: 2020-02-26  75.0 56.57    a
12: 2020-02-26  30.0 56.57    b
13: 2020-02-26  35.0 56.57    b
14: 2020-02-26  40.0 56.57    b
15: 2020-02-26  45.0 56.57    b
16: 2020-02-26  50.0 56.57    b
17: 2020-02-26  52.5 56.57    b
18: 2020-02-26  55.0 56.57    b
19: 2020-02-26  57.5 56.57    b
20: 2020-02-26  60.0 56.57    b
21: 2020-02-26  65.0 56.57    b
22: 2020-02-26  70.0 56.57    b
23: 2020-04-08  25.0 39.65    a
24: 2020-04-08  27.5 39.65    a
25: 2020-04-08  30.0 39.65    a
26: 2020-04-08  32.5 39.65    a
27: 2020-04-08  35.0 39.65    a
28: 2020-04-08  37.5 39.65    a
29: 2020-04-08  40.0 39.65    a
30: 2020-04-08  42.5 39.65    a
31: 2020-04-08  45.0 39.65    a
32: 2020-04-08  47.5 39.65    a
33: 2020-04-08  50.0 39.65    a
34: 2020-04-08  52.5 39.65    a
35: 2020-04-08  55.0 39.65    a
36: 2020-04-08  57.5 39.65    a
37: 2020-04-08  60.0 39.65    a
38: 2020-04-08  70.0 39.65    a
39: 2020-04-08  75.0 39.65    a
40: 2020-04-08  15.0 39.65    b
41: 2020-04-08  20.0 39.65    b
42: 2020-04-08  22.5 39.65    b
43: 2020-04-08  25.0 39.65    b
44: 2020-04-08  27.5 39.65    b
45: 2020-04-08  30.0 39.65    b
46: 2020-04-08  32.5 39.65    b
47: 2020-04-08  35.0 39.65    b
48: 2020-04-08  37.5 39.65    b
49: 2020-04-08  40.0 39.65    b
50: 2020-04-08  42.5 39.65    b
51: 2020-04-08  45.0 39.65    b
52: 2020-04-08  47.5 39.65    b
53: 2020-04-08  50.0 39.65    b
54: 2020-04-08  52.5 39.65    b
55: 2020-04-08  55.0 39.65    b
56: 2020-04-08  57.5 39.65    b
57: 2020-04-08  60.0 39.65    b
58: 2020-04-08  65.0 39.65    b
59: 2020-04-08  70.0 39.65    b
          date spots price type

Related: [data.table equivalent of tidyr::complete()](https://stackoverflow.com/questions/43483497/data-table-equivalent-of-tidyrcomplete) — Henrik, Mar 14 '23 at 19:19

TimTeaFan · Accepted Answer · 2023-03-14T18:48:50.513

Would it be ok to expand the grid and then join the data to this empty grid? We can use data.table::CJ() for this:

library(data.table)

# an older version of this answer used `expand.grid()` instead of `data.table::CJ()`:
all_combs <- CJ(date = unique(temp_dt$date),
                spots = unique(temp_dt$spots),
                type = unique(temp_dt$type))

merge(all_combs,
      temp_dt,
      by = c("date", "spots", "type"),
      all.x = TRUE)

#>           date spots type price
#>  1: 2020-02-26  15.0    a    NA
#>  2: 2020-02-26  15.0    b    NA
#>  3: 2020-02-26  20.0    a    NA
#>  4: 2020-02-26  20.0    b    NA
#>  5: 2020-02-26  22.5    a    NA
#>  6: 2020-02-26  22.5    b    NA
#>  7: 2020-02-26  25.0    a    NA
#>  8: 2020-02-26  25.0    b    NA
#>  9: 2020-02-26  27.5    a    NA
#> 10: 2020-02-26  27.5    b    NA
#> 11: 2020-02-26  30.0    a    NA
#> 12: 2020-02-26  30.0    b 56.57
#> 13: 2020-02-26  32.5    a    NA
#> 14: 2020-02-26  32.5    b    NA
#> 15: 2020-02-26  35.0    a 56.57
#> 16: 2020-02-26  35.0    b 56.57
#> 17: 2020-02-26  37.5    a    NA
#> 18: 2020-02-26  37.5    b    NA
#> 19: 2020-02-26  40.0    a 56.57
#> 20: 2020-02-26  40.0    b 56.57
#> 21: 2020-02-26  42.5    a    NA
#> 22: 2020-02-26  42.5    b    NA
#> 23: 2020-02-26  45.0    a 56.57
#> 24: 2020-02-26  45.0    b 56.57
#> 25: 2020-02-26  47.5    a    NA
#> 26: 2020-02-26  47.5    b    NA
#> 27: 2020-02-26  50.0    a 56.57
#> 28: 2020-02-26  50.0    b 56.57
#> 29: 2020-02-26  52.5    a 56.57
#> 30: 2020-02-26  52.5    b 56.57
#> 31: 2020-02-26  55.0    a 56.57
#> 32: 2020-02-26  55.0    b 56.57
#> 33: 2020-02-26  57.5    a 56.57
#> 34: 2020-02-26  57.5    b 56.57
#> 35: 2020-02-26  60.0    a 56.57
#> 36: 2020-02-26  60.0    b 56.57
#> 37: 2020-02-26  65.0    a 56.57
#> 38: 2020-02-26  65.0    b 56.57
#> 39: 2020-02-26  70.0    a 56.57
#> 40: 2020-02-26  70.0    b 56.57
#> 41: 2020-02-26  75.0    a 56.57
#> 42: 2020-02-26  75.0    b    NA
#> 43: 2020-04-08  15.0    a    NA
#> 44: 2020-04-08  15.0    b 39.65
#> 45: 2020-04-08  20.0    a    NA
#> 46: 2020-04-08  20.0    b 39.65
#> 47: 2020-04-08  22.5    a    NA
#> 48: 2020-04-08  22.5    b 39.65
#> 49: 2020-04-08  25.0    a 39.65
#> 50: 2020-04-08  25.0    b 39.65
#> 51: 2020-04-08  27.5    a 39.65
#> 52: 2020-04-08  27.5    b 39.65
#> 53: 2020-04-08  30.0    a 39.65
#> 54: 2020-04-08  30.0    b 39.65
#> 55: 2020-04-08  32.5    a 39.65
#> 56: 2020-04-08  32.5    b 39.65
#> 57: 2020-04-08  35.0    a 39.65
#> 58: 2020-04-08  35.0    b 39.65
#> 59: 2020-04-08  37.5    a 39.65
#> 60: 2020-04-08  37.5    b 39.65
#> 61: 2020-04-08  40.0    a 39.65
#> 62: 2020-04-08  40.0    b 39.65
#> 63: 2020-04-08  42.5    a 39.65
#> 64: 2020-04-08  42.5    b 39.65
#> 65: 2020-04-08  45.0    a 39.65
#> 66: 2020-04-08  45.0    b 39.65
#> 67: 2020-04-08  47.5    a 39.65
#> 68: 2020-04-08  47.5    b 39.65
#> 69: 2020-04-08  50.0    a 39.65
#> 70: 2020-04-08  50.0    b 39.65
#> 71: 2020-04-08  52.5    a 39.65
#> 72: 2020-04-08  52.5    b 39.65
#> 73: 2020-04-08  55.0    a 39.65
#> 74: 2020-04-08  55.0    b 39.65
#> 75: 2020-04-08  57.5    a 39.65
#> 76: 2020-04-08  57.5    b 39.65
#> 77: 2020-04-08  60.0    a 39.65
#> 78: 2020-04-08  60.0    b 39.65
#> 79: 2020-04-08  65.0    a    NA
#> 80: 2020-04-08  65.0    b 39.65
#> 81: 2020-04-08  70.0    a 39.65
#> 82: 2020-04-08  70.0    b 39.65
#> 83: 2020-04-08  75.0    a 39.65
#> 84: 2020-04-08  75.0    b    NA
#>           date spots type price

@akrun suggested a more compact version, which is really slick, in the comments:

nm1 <- c("date", "spots", "type")

temp_dt[temp_dt[, do.call("CJ", c(mget(nm1), unique = TRUE))], on = c(nm1)]
# same output as above

Data from OP:

temp_dt = structure(list(date = structure(c(18318L, 18318L, 18318L, 18318L, 
                                            18318L, 18318L, 18318L, 18318L, 18318L, 18318L, 18318L, 18318L, 
                                            18318L, 18318L, 18318L, 18318L, 18318L, 18318L, 18318L, 18318L, 
                                            18318L, 18318L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 
                                            18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 
                                            18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 
                                            18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 
                                            18360L, 18360L, 18360L, 18360L, 18360L, 18360L, 18360L), 
                                          class = c("IDate", "Date")), 
                         spots = c(35, 40, 45, 50, 52.5, 55, 57.5, 60, 65, 70, 
                                   75, 30, 35, 40, 45, 50, 52.5, 55, 57.5, 60, 65, 70, 25, 27.5, 
                                   30, 32.5, 35, 37.5, 40, 42.5, 45, 47.5, 50, 52.5, 55, 57.5, 60, 
                                   70, 75, 15, 20, 22.5, 25, 27.5, 30, 32.5, 35, 37.5, 40, 42.5, 
                                   45, 47.5, 50, 52.5, 55, 57.5, 60, 65, 70), 
                         price = c(56.57, 56.57, 
                                   56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 
                                   56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 56.57, 
                                   56.57, 56.57, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 
                                   39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 
                                   39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 
                                   39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 39.65, 
                                   39.65, 39.65, 39.65), 
                         type = c("a", "a", "a", "a", "a", "a", 
                                  "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", 
                                  "b", "b", "b", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", 
                                  "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", 
                                  "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", 
                                  "b")), row.names = c(NA, -59L), class = c("data.table", "data.frame"
                                  ))

^{Created on 2023-03-14 with reprex v2.0.2}

Thanks Tim, the solution looks good. I am testing it on my data. — Saurabh, Mar 14 '23 at 18:23
@Saurabh: I just updated my answer with the more performant `CJ()` function instead of `expand.grid()`. — TimTeaFan, Mar 14 '23 at 18:27
@akrun: Thanks for suggesting this, I added this to my answer. It's really short & clever. — TimTeaFan, Mar 14 '23 at 18:49

Expanding the data.table using R

1 Answers1