5
ID <- 1:6
math <- c("YES","NO","YES","NO",NA,NA)
history <- c(NA,NA,"NO","NO","YES",NA)

dt <- data.frame(ID, math, history)

  ID math history
1  1  YES    <NA>
2  2   NO    <NA>
3  3  YES      NO
4  4   NO      NO
5  5 <NA>     YES
6  6 <NA>    <NA>

I want to make an additional column ("pass") as follows

  1. If a student ever had "yes" at least once: "YES" (No matter if another subject is missing data or not or so.)

  2. If the student did not get "yes"

    • If both subjects are missing data : NA
    • if one of subjects is "NO" : "NO"

so, the columns would like this: (I can do this manually with this minimal example. but not with my real data)

> dt
  ID math history pass
1  1  YES    <NA>  YES
2  2   NO    <NA>   NO
3  3  YES      NO  YES
4  4   NO      NO   NO
5  5 <NA>     YES  YES
6  6 <NA>    <NA> <NA>

I tried to use

dt$pass <- ifelse(rowSums(dt[,-1]=="YES",na.rm=T)>0,"YES","NO")

this code, but it was tricky because if I put na.rm=TRUE they consider NA is "NO" (ID 6 student will be "NO")

if I put na.rm=FALSE, an only students that have both subject's data are considered.

In my data, I have really lots of columns, not only math and history.

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
yoo
  • 491
  • 3
  • 10

6 Answers6

6

A simple base solution is

dt$pass <- apply(dt[-1], 1, \(x) sort(x, dec = TRUE)[1])

# > dt
#   ID math history pass
# 1  1  YES    <NA>  YES
# 2  2   NO    <NA>   NO
# 3  3  YES      NO  YES
# 4  4   NO      NO   NO
# 5  5 <NA>     YES  YES
# 6  6 <NA>    <NA> <NA>

Its dplyr equivalent is

library(dplyr)

dt %>%
  rowwise() %>%
  mutate(pass = sort(c_across(-1), dec = TRUE)[1]) %>%
  ungroup()
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
4

Try this

fun <- function(x){
  if(all(is.na(x))) return(NA_character_)
  if(any(na.omit(x == "YES"))) return("YES")
  return("NO")
}

dt %>% rowwise() %>% mutate(pass = fun(c_across(-ID)))
Mohamed Desouky
  • 4,340
  • 2
  • 4
  • 19
3

A tidyverse solution that is robust with respect to the number (and names) of subjects:

library(tidyverse)

dt %>% 
  mutate(
    pass=dt %>% 
           pivot_longer(-ID) %>% 
           group_by(ID) %>% 
           summarise(
             anyYes=sum(value == "YES", na.rm=T), 
             anyNo=sum(value == "NO", na.rm=T)
           ) %>% 
           mutate(
             pass=ifelse(
                    anyYes >= 1, 
                    "YES", 
                    ifelse(anyNo >= 1, "NO", NA)
                   )
           ) %>% 
           pull(pass)
  )
  ID math history pass
1  1  YES    <NA>  YES
2  2   NO    <NA>   NO
3  3  YES      NO  YES
4  4   NO      NO   NO
5  5 <NA>     YES  YES
6  6 <NA>    <NA> <NA>

The key is the pivoting to long format.

Limey
  • 10,234
  • 2
  • 12
  • 32
3
library(tidyverse)

ID <- 1:6
math <- c("YES", "NO", "YES", "NO", NA, NA)
history <- c(NA, NA, "NO", "NO", "YES", NA)

dt <- data.frame(ID, math, history)

dt |> 
  rowwise() |> 
  mutate(pass = case_when(
    sum(c_across(-ID) == "YES", na.rm = TRUE) >= 1 ~ "YES",
    sum(c_across(-ID) == "NO", na.rm = TRUE) >= 1  ~ "No",
    TRUE                                           ~ NA_character_
  )) 

# Add ungroup() if you need to do further ungrouped processing.

#> # A tibble: 6 × 4
#>      ID math  history pass 
#>   <int> <chr> <chr>   <chr>
#> 1     1 YES   <NA>    YES  
#> 2     2 NO    <NA>    No   
#> 3     3 YES   NO      YES  
#> 4     4 NO    NO      No   
#> 5     5 <NA>  YES     YES  
#> 6     6 <NA>  <NA>    <NA>

Created on 2022-06-10 by the reprex package (v2.0.1)

Carl
  • 4,232
  • 2
  • 12
  • 24
3

Maybe rowMeans shows the behaviour you would like rowSums shoul have. In case there are only NA it returns NA.

c("NO", "YES")[1 + (rowMeans(dt[-1] == "YES", TRUE) > 0)]
#[1] "YES" "NO"  "YES" "NO"  "YES" NA

Or using your line of code:

ifelse(rowMeans(dt[-1]=="YES",na.rm=T)>0,"YES","NO")
#[1] "YES" "NO"  "YES" "NO"  "YES" NA
GKi
  • 37,245
  • 2
  • 26
  • 48
2

Seems like the order is what matters:

dt$pass <- NA
dt$pass[dt$math == "NO" | dt$history=="NO"] <- "NO"
dt$pass[dt$math == "YES" | dt$history=="YES"] <- "YES"

> dt
  ID math history pass
1  1  YES    <NA>  YES
2  2   NO    <NA>   NO
3  3  YES      NO  YES
4  4   NO      NO   NO
5  5 <NA>     YES  YES
6  6 <NA>    <NA> <NA>
tavdp
  • 421
  • 2
  • 6
  • 11