2

I'm working with a dataset that has many columns called status1, status2, etc. Within those columns, it says if someone is exempt, complete, registered, etc.

Unfortunately, the exempt inputs are not consistent; here's a sample:

library(dplyr)

problem <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
                  status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
                  status2 = c("exempt", "Completed", "Completed", "Pending"),
                  status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"))

I'm trying to use case_when() to make a new column that has their final status. If it ever says completed, then they are completed. If it ever says exempt without saying complete, then they are exempt.

The important part is that I want my code to use contains("status"), or some equivalent that only targets the status columns and doesn't require typing them all, and I want it to only require a partial string match for exempt.

As for using contains with case_when, I saw this example, but I wasn't able to apply it to my case: mutate with case_when and contains

This is what I've tried to use so far, but as you can guess, it has not worked:

library(purrr)
library(dplyr)
library(stringr)
solution <- problem %>%
  mutate(final= case_when(pmap_chr(select(., contains("status")), ~
    any(c(...) == str_detect(., "Exempt") ~ "Exclude",
               TRUE ~ "Complete"
  ))))

Here's what I want the final product to look like:

solution <- tibble(person = c("Corey", "Sibley", "Justin", "Ruth"),
                   status1 = c("7EXEMPT", "Completed", "Completed", "Pending"),
                   status2 = c("exempt", "Completed", "Completed", "Pending"),
                   status3 = c("EXEMPTED", "Completed", "Completed", "ExempT - 14"),
                   final = c("Exclude", "Completed", "Completed", "Exclude")) 

Thank you!

J.Sabree
  • 2,280
  • 19
  • 48

1 Answers1

5

I think you are doing it backwards. Put case_when inside pmap_chr instead of the other way around:

library(dplyr)
library(purrr)
library(stringr)

problem %>%
  mutate(final = pmap_chr(select(., contains("status")), 
                          ~ case_when(any(str_detect(c(...), "(?i)Exempt")) ~ "Exclude",
                                      TRUE ~ "Completed")))

For each pmap iteration (each row of problem dataset), we want to use case_when to check if there exists the string Exempt. (?i) in str_detect makes it case insensitive. This is the same as writing str_detect(c(...), regex("Exempt", ignore_case = TRUE))

Output:

# A tibble: 4 x 5
  person status1   status2   status3     final    
  <chr>  <chr>     <chr>     <chr>       <chr>    
1 Corey  7EXEMPT   exempt    EXEMPTED    Exclude  
2 Sibley Completed Completed Completed   Completed
3 Justin Completed Completed Completed   Completed
4 Ruth   Pending   Pending   ExempT - 14 Exclude
acylam
  • 18,231
  • 5
  • 36
  • 45
  • thank you! Quick question: can you explain what the (?i) means? – J.Sabree Jun 13 '19 at 20:42
  • 1
    @J.Sabree Was just editing my answer. It is a regex modifier that turns on case-insensitive mode. See this answer: https://stackoverflow.com/questions/15145659/what-do-i-and-i-in-regex-mean – acylam Jun 13 '19 at 20:46
  • 1
    thank you for the answer and the explanation! The explanation really helps to apply to new code. – J.Sabree Jun 13 '19 at 22:18