3

I want to keep empty groups (with a default value like NA or 0) when grouping by multiple conditions.

dt = data.table(user = c("A", "A", "B"), date = c("t1", "t2", "t1"), duration = c(1, 2, 1))
dt[, .("total" = sum(duration)), by = .(date, user)]

Result:

   date user total
1:   t1    A     1
2:   t2    A     2
3:   t1    B     1

Desired result:

   date user total
1:   t1    A     1
2:   t2    A     2
3:   t1    B     1
3:   t2    B    NA

One solution could be to add rows with 0 values before grouping, but it would require to create the Descartes product of many columns and manually checking if a value already exists for that combination, but I would prefer a built-in / simpler one.

Harlequin
  • 637
  • 6
  • 8
  • related: [inserting rows into data frame when values missing in category](https://stackoverflow.com/questions/46330683/inserting-rows-into-data-frame-when-values-missing-in-category) – IceCreamToucan Aug 15 '19 at 18:27

2 Answers2

6

You can try:

dt[CJ(user = user, date = date, unique = TRUE), on = .(user, date)]

   user date duration
1:    A   t1        1
2:    A   t2        2
3:    B   t1        1
4:    B   t2       NA
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
2

Here is an option with complete from tidyr

library(tidyr)
library(dplyr)
dt1 <- dt[, .("total" = sum(duration)), by = .(date, user)]
dt1 %>%
    complete(user, date)
#   user  date  total
#  <chr> <chr> <dbl>
#  A     t1        1
#2 A     t2        2
#3 B     t1        1
#4 B     t2       NA

Or using dcast/melt

melt(dcast(dt, user ~ date, value.var = 'duration', sum), 
      id.var = 'user', variable.name = 'date', value.name = 'total')
akrun
  • 874,273
  • 37
  • 540
  • 662