Say we have a data frame with two variables var1
and var2
, each a factor with two levels
library(dplyr)
df <- data.frame(var1 = factor(sample(c("A", "B"), 20, replace = T)),
var2 = factor(rep(c("C","D"), each = 10)))
When we summarise this dataframe
df %>% group_by(var1, var2) %>% summarise(count = n())
We get
# A tibble: 4 x 3
# Groups: var1 [?]
var1 var2 count
<fct> <fct> <int>
1 A C 5
2 A D 4
3 B C 5
4 B D 6
But if we remove all instances of one factor
df2 <- df[1:10,]
And summarise
df2 %>% group_by(var1, var2) %>% summarise(count = n())
We get
# A tibble: 2 x 3
# Groups: var1 [?]
var1 var2 count
<fct> <fct> <int>
1 A C 5
2 B C 5
The A-D
and B-D
cells are (unsurprisingly) not summarised because there are no longer any instances in these cells.
My question is is there any quick way to report these cells as 0 instead of omitting them from the summary table?
I know the D
level of var2
is still embedded in that factor in df2
because
str(df2)
yields
'data.frame': 10 obs. of 2 variables:
$ var1: Factor w/ 2 levels "A","B": 1 2 1 1 2 2 2 2 1 1
$ var2: Factor w/ 2 levels "C","D": 1 1 1 1 1 1 1 1 1 1
So how do I get dplyr
to report the 0 cells as well?