2

Below is a sample DF that illustrates the issue that I am having. I am having an issue with a group not having a value for all variables so R is not returning anything for it. That is, in the data below R returns:

Course   Gender  n
English1 Female  1
English1 Male    3
English2 Female  2
English2 Male    1
English2 Unknown 1
English3 Female  3
English3 Unknown 1

df1 <- data.frame("Course"=c("English1", "English1", "English1", "English1", 
                             "English2", "English2", "English2", "English2", 
                             "English3", "English3", "English3", "English3"),  
                  Gender=c("Male", "Female", "Male", "Male", "Male", "Female", 
                           "Unknown", "Female", "Female", "Female", "Female", 
                           "Unknown"),  Grade=c("A", "A", "C", "D", "D", "A", "B", 
                                                "C", "B", "D", "A", "C"))
library(dplyr)
df1 %>% group_by(Course, Gender) %>% count

What I am trying to do is return a Null or 0 when there are not counts of the Gender within the Course group. I would like the data to return this (I tagged the new rows with *):

Course   Gender  n
English1 Female  1
English1 Male    3
English1 Unknown 0*
English2 Female  2
English2 Male    1
English2 Unknown 1
English3 Female  3
English3 Male    0*
English3 Unknown 1

The reason that I need this is because I need to have identical groups (three genders for each course) for an rMarkdown output. Any help is greatly appreciated

jay.sf
  • 60,139
  • 8
  • 53
  • 110
Tyler
  • 93
  • 1
  • 9

3 Answers3

2
data.frame(xtabs(a~Gender+Course,cbind(a=1,df1)))[c(2,1,3)]
    Course  Gender Freq
1 English1  Female    1
2 English1    Male    3
3 English1 Unknown    0
4 English2  Female    2
5 English2    Male    1
6 English2 Unknown    1
7 English3  Female    3
8 English3    Male    0
9 English3 Unknown    1

If you do not care about the ordering then:

data.frame(xtabs(Grade~.,cbind(Grade=1,df1)))
Onyambu
  • 67,392
  • 3
  • 24
  • 53
1

Actually, a dplyr solution has already been solved here using the complete function after the count function in your code. You choose the fill=list(value=0) option for filling those missing rows with the values you need, but it could be any other.

Note, you have to ungroup first or you will be doing this operation once per group, thus duplicating your rows.

This is pretty straightforward now and more adjusted to the way you are expressing your needs:

    df1 %>%
     group_by(Course,Gender) %>%
     count %>% 
     ungroup() %>%
     complete(Course,Gender,fill=list(n=0))



 # A tibble: 9 x 3
  Course   Gender      n
  <fct>    <fct>   <dbl>
1 English1 Female      1
2 English1 Male        3
3 English1 Unknown     0
4 English2 Female      2
5 English2 Male        1
6 English2 Unknown     1
7 English3 Female      3
8 English3 Male        0
9 English3 Unknown     1
Just Burfi
  • 159
  • 9
0

As of dplyr 0.8.0, you can just add .drop = FALSE to the statement:

df1 %>% 
  group_by(Course, Gender, .drop = FALSE) %>% 
  count

Output:

# A tibble: 9 x 3
# Groups:   Course, Gender [9]
  Course   Gender      n
  <fct>    <fct>   <int>
1 English1 Female      1
2 English1 Male        3
3 English1 Unknown     0
4 English2 Female      2
5 English2 Male        1
6 English2 Unknown     1
7 English3 Female      3
8 English3 Male        0
9 English3 Unknown     1

Note that this can be simplified and still works also if you just use count alone:

df1 %>% count(Course, Gender, .drop = FALSE)

# A tibble: 9 x 3
  Course   Gender      n
  <fct>    <fct>   <int>
1 English1 Female      1
2 English1 Male        3
3 English1 Unknown     0
4 English2 Female      2
5 English2 Male        1
6 English2 Unknown     1
7 English3 Female      3
8 English3 Male        0
9 English3 Unknown     1
arg0naut91
  • 14,574
  • 2
  • 17
  • 38