Showing cells with zero instances of a factor in a summary table instead of omitting them

Question

Say we have a data frame with two variables var1 and var2, each a factor with two levels

library(dplyr)
df <- data.frame(var1 = factor(sample(c("A", "B"), 20, replace = T)),
                 var2 = factor(rep(c("C","D"), each = 10)))

When we summarise this dataframe

df %>% group_by(var1, var2) %>% summarise(count = n())

We get

# A tibble: 4 x 3
# Groups:   var1 [?]
  var1  var2  count
  <fct> <fct> <int>
1 A     C         5
2 A     D         4
3 B     C         5
4 B     D         6

But if we remove all instances of one factor

df2 <- df[1:10,]

And summarise

df2 %>% group_by(var1, var2) %>% summarise(count = n())

We get

# A tibble: 2 x 3
# Groups:   var1 [?]
  var1  var2  count
  <fct> <fct> <int>
1 A     C         5
2 B     C         5

The A-D and B-D cells are (unsurprisingly) not summarised because there are no longer any instances in these cells.

My question is is there any quick way to report these cells as 0 instead of omitting them from the summary table?

I know the D level of var2 is still embedded in that factor in df2 because

str(df2)

yields

'data.frame':   10 obs. of  2 variables:
 $ var1: Factor w/ 2 levels "A","B": 1 2 1 1 2 2 2 2 1 1
 $ var2: Factor w/ 2 levels "C","D": 1 1 1 1 1 1 1 1 1 1

So how do I get dplyr to report the 0 cells as well?

score 2 · Accepted Answer · answered Jan 11 '19 at 01:52

We may use complete along with ungroup (without it we would get too many combinations):

df2 %>% group_by(var1, var2) %>% summarise(count = n()) %>% ungroup() %>%
  complete(var1, var2, fill = list(count = 0))
# A tibble: 4 x 3
#   var1  var2  count
#   <fct> <fct> <dbl>
# 1 A     C         3
# 2 A     D         0
# 3 B     C         7
# 4 B     D         0

or complete and distinct:

df2 %>% group_by(var1, var2) %>% summarise(count = n()) %>%
  complete(var1, var2, fill = list(count = 0)) %>% distinct()
# A tibble: 4 x 3
#   var1  var2  count
#   <fct> <fct> <dbl>
# 1 A     C         3
# 2 A     D         0
# 3 B     C         7
# 4 B     D         0

Nice - if you make it, `fill=list(count=0L)` the `count` will stay as an integer. — thelatemail, Jan 11 '19 at 02:10
@Julius Vainora, I think `complete()` has made my life complete. Thank you! — llewmills, Jan 11 '19 at 03:05

Showing cells with zero instances of a factor in a summary table instead of omitting them

1 Answers1