Create dummy variable for student with All As OR As and Bs in R (Tidyverse)

Question

Good Afternoon,

I need to create a list of Students who have all As for this Semester. I need to create another list of students who have all Bs for this semester. I can't figure out how to actually get this done with the data I have. Below is what I have and what I'm looking for. Any thoughts?

original_df <- 
  tribble(~id, ~subject, ~grade,
          "001", "ela", "A+",
          "001", "math", "A",
          "001", "science", "A-",
          "002", "ela", "A",
          "002", "math", "B+",
          "002", "science", "B-",
          "003", "ela", "A",
          "003", "math", "A",
          "003", "science", "A-",
          "004", "ela", "C",
          "004", "math", "C",
          "004", "science", "A+",
          )

summarized_df <- 
  tribble(~id, ~all_As, ~As_and_Bs,
          "001", 1, 0, 
          "002", 0, 1, 
          "003", 1, 0,
          "004", 0, 0
          )

akrun · Answer 1 · 2021-04-22T21:20:43.863

4

One method is after grouping by 'id', use a regex to check for 'A', or extract the letters by removing the punct and check if all of 'A', 'B' are present

library(dplyr)
library(stringr)
original_df %>%
   group_by(id) %>% 
   summarise(all_As = +(all(str_detect(grade, 'A'))),
     As_and_Bs = +(all(c('A', 'B') %in% str_remove(grade, '[-+]'))),
        .groups = 'drop')

-output

# A tibble: 4 x 3
#  id    all_As As_and_Bs
#* <chr>  <int>     <int>
#1 001        1         0
#2 002        0         1
#3 003        1         0
#4 004        0         0

Or as @BenBolker mentioned in the comments

original_df %>%
   group_by(id) %>% 
   summarise(all_As=all(grepl("^A",grade)),
             As_and_Bs=!all_As && all(grepl("^[AB]",grade)))

edited Apr 22 '21 at 21:20

answered Apr 22 '21 at 21:14

akrun

874,273
37
540
662

1

I was going to use `all(grepl("^[AB]",grade))` (or substitute `str_detect` for `grepl` and reverse order of args) – Ben Bolker Apr 22 '21 at 21:15
1

might want to note that `+` is shorthand for `as.numeric()` here ... – Ben Bolker Apr 22 '21 at 21:16
@BenBolker wouldn't that also be TRUE if all 'A' are TRUE without any B. I meant if you are using it for As and Bs – akrun Apr 22 '21 at 21:16
1

Does `summarise` work sequentially? Can we do `all_As=all(grepl("^A",grade)), As_and_Bs=!all_As && all(grepl("^[AB],grade))` ? – Ben Bolker Apr 22 '21 at 21:18
1

@Mishalb it is argument in `summarise`. In some versions, if not provided, it gives a warning. by default it drops the last grouping variable in that order. You may also check [here](https://stackoverflow.com/questions/62140483/how-to-interpret-dplyr-message-summarise-regrouping-output-by-x-override/62140681#62140681) – akrun Apr 22 '21 at 21:24
I was playing around with this, and realized that if I change the grades a bit this solution doesn't work (and I'm trying to understand why". I changed the science grade for ID number "002" to a C- and it stopped working. Why? – Mishalb Apr 22 '21 at 21:26
@Mishalb It is working fine for me `original_df$grade[5] <- "C-"`. The first solution gives me 1 for 002 As_andBs as it have both A and B. The second solution gives FALSE. Depends on how you want the output i.e. are you checking only values in that id are A and B and not others – akrun Apr 22 '21 at 21:29

score 2 · Answer 2 · answered Apr 22 '21 at 21:27

A data.table option

setDT(original_df)[
  ,
  .(
    all_As = +!var(startsWith(grade, "A")),
    As_and_Bs = +all(c("A", "B") %in% substr(grade, 1, 1))
  ), id
]

gives

    id all_As As_and_Bs
1: 001      1         0
2: 002      0         1
3: 003      1         0
4: 004      0         0

thelatemail · Answer 3 · 2021-04-22T23:02:45.290

Another data.table option, trying to separate out the functions and input as much as possible to make it flexible.

library(data.table)
setDT(original_df)

only <- function(x,y) all(x == y)
incl <- function(x,y) all(x %in% y)

original_df[
  , 
  Map(
    function(l,f) f(l, substr(grade, 1, 1)),
    list(all_as = "A", all_bs = "B", as_and_bs = c("A","B")),
    c(only, only, incl)
  ),
  by=id
]

#    id all_as all_bs as_and_bs
#1: 001   TRUE  FALSE     FALSE
#2: 002  FALSE  FALSE      TRUE
#3: 003   TRUE  FALSE     FALSE
#4: 004  FALSE  FALSE     FALSE

tidyverse translation:

original_df %>%
  group_by(id) %>%
  mutate(subgrade = substr(grade,1,1)) %>%
  summarise(
    across(
      c(subgrade),
      list(
        all_as    = ~only(x="A", y=.x),
        all_bs    = ~only(x="B", y=.x),
        as_and_bs = ~incl(x=c("A","B"), y=.x)
      ),
      .names="{fn}"
    )
  )

#`summarise()` ungrouping output (override with `.groups` argument)
## A tibble: 4 x 4
#  id    all_as all_bs as_and_bs
#  <chr> <lgl>  <lgl>  <lgl>    
#1 001   TRUE   FALSE  FALSE    
#2 002   FALSE  FALSE  TRUE     
#3 003   TRUE   FALSE  FALSE    
#4 004   FALSE  FALSE  FALSE

Create dummy variable for student with All As OR As and Bs in R (Tidyverse)

3 Answers3