Adding NULL when no variable data

Question

Below is a sample DF that illustrates the issue that I am having. I am having an issue with a group not having a value for all variables so R is not returning anything for it. That is, in the data below R returns:

Course   Gender  n
English1 Female  1
English1 Male    3
English2 Female  2
English2 Male    1
English2 Unknown 1
English3 Female  3
English3 Unknown 1

df1 <- data.frame("Course"=c("English1", "English1", "English1", "English1", 
                             "English2", "English2", "English2", "English2", 
                             "English3", "English3", "English3", "English3"),  
                  Gender=c("Male", "Female", "Male", "Male", "Male", "Female", 
                           "Unknown", "Female", "Female", "Female", "Female", 
                           "Unknown"),  Grade=c("A", "A", "C", "D", "D", "A", "B", 
                                                "C", "B", "D", "A", "C"))
library(dplyr)
df1 %>% group_by(Course, Gender) %>% count

What I am trying to do is return a Null or 0 when there are not counts of the Gender within the Course group. I would like the data to return this (I tagged the new rows with *):

Course   Gender  n
English1 Female  1
English1 Male    3
English1 Unknown 0*
English2 Female  2
English2 Male    1
English2 Unknown 1
English3 Female  3
English3 Male    0*
English3 Unknown 1

The reason that I need this is because I need to have identical groups (three genders for each course) for an rMarkdown output. Any help is greatly appreciated

`group_by(Course, Gender, .drop = FALSE)` solves this if you're using `dplyr 0.8.0` or higher — arg0naut91, Feb 20 '19 at 22:15

score 2 · Answer 1 · answered Feb 20 '19 at 20:17

data.frame(xtabs(a~Gender+Course,cbind(a=1,df1)))[c(2,1,3)]
    Course  Gender Freq
1 English1  Female    1
2 English1    Male    3
3 English1 Unknown    0
4 English2  Female    2
5 English2    Male    1
6 English2 Unknown    1
7 English3  Female    3
8 English3    Male    0
9 English3 Unknown    1

If you do not care about the ordering then:

data.frame(xtabs(Grade~.,cbind(Grade=1,df1)))

score 1 · Accepted Answer · answered Feb 20 '19 at 20:34

Actually, a dplyr solution has already been solved here using the complete function after the count function in your code. You choose the fill=list(value=0) option for filling those missing rows with the values you need, but it could be any other.

Note, you have to ungroup first or you will be doing this operation once per group, thus duplicating your rows.

This is pretty straightforward now and more adjusted to the way you are expressing your needs:

    df1 %>%
     group_by(Course,Gender) %>%
     count %>% 
     ungroup() %>%
     complete(Course,Gender,fill=list(n=0))



 # A tibble: 9 x 3
  Course   Gender      n
  <fct>    <fct>   <dbl>
1 English1 Female      1
2 English1 Male        3
3 English1 Unknown     0
4 English2 Female      2
5 English2 Male        1
6 English2 Unknown     1
7 English3 Female      3
8 English3 Male        0
9 English3 Unknown     1

arg0naut91 · Answer 3 · 2019-02-20T23:32:30.657

As of dplyr 0.8.0, you can just add .drop = FALSE to the statement:

df1 %>% 
  group_by(Course, Gender, .drop = FALSE) %>% 
  count

Output:

# A tibble: 9 x 3
# Groups:   Course, Gender [9]
  Course   Gender      n
  <fct>    <fct>   <int>
1 English1 Female      1
2 English1 Male        3
3 English1 Unknown     0
4 English2 Female      2
5 English2 Male        1
6 English2 Unknown     1
7 English3 Female      3
8 English3 Male        0
9 English3 Unknown     1

Note that this can be simplified and still works also if you just use count alone:

df1 %>% count(Course, Gender, .drop = FALSE)

# A tibble: 9 x 3
  Course   Gender      n
  <fct>    <fct>   <int>
1 English1 Female      1
2 English1 Male        3
3 English1 Unknown     0
4 English2 Female      2
5 English2 Male        1
6 English2 Unknown     1
7 English3 Female      3
8 English3 Male        0
9 English3 Unknown     1

Adding NULL when no variable data

3 Answers3