0

using the following dataset

 structure(list(...1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12), V1 = c("overstress", "flicker", "lotteri", "life", 
"charg", "capac", "health", "drain", "degrad", "protector", "bright", 
"use", "overstress", "flicker", "lotteri", "life", "charg", "capac", 
"health", "drain", "degrad", "protector", "bright", "use", "overstress", 
"flicker", "lotteri", "life", "charg", "capac", "health", "drain", 
"degrad", "protector", "bright", "use"), term = c("corr1", "corr1", 
"corr1", "corr1", "corr1", "corr1", "corr1", "corr1", "corr1", 
"corr1", "corr1", "corr1", "corr2", "corr2", "corr2", "corr2", 
"corr2", "corr2", "corr2", "corr2", "corr2", "corr2", "corr2", 
"corr2", "corr3", "corr3", "corr3", "corr3", "corr3", "corr3", 
"corr3", "corr3", "corr3", "corr3", "corr3", "corr3"), correlation = c(0.5, 
0.43, 0.42, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.53, 
0.29, 0.25, 0.25, 0.23, 0.2, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 0.45, 0.16, 0.15)), row.names = c(NA, -36L), class = c("tbl_df", 
"tbl", "data.frame"))

I am looking to change if the word is corr1, corr2 or corr3, to toil1,toil2 or toil3. I tried the following code, but only receive the following error term:

three_terms_corrs_gathered$term <- if
(three_terms_corrs_gathered$term  == "corr1"){toil1} else if
(three_terms_corrs_gathered$term  == "corr2"){toil2} else
{toil3}

Warning message:

In if (three_terms_corrs_gathered$term == "corr1") { : the condition has length > 1 and only the first element will be used. So it only changes to the first condition. What am I doing wrong?

mischva11
  • 2,811
  • 3
  • 18
  • 34
  • Possible duplicate of https://stackoverflow.com/q/11865195/3358272 – r2evans Jan 26 '21 at 17:45
  • If so, I don't understand how to generalize it. In the link, they only have two variables and can therefore use ifelse. In my case, I have three conditions. How do I generalize it? – Philip Olsson Jan 26 '21 at 18:14
  • Options: (1) nested `ifelse`, not my preferred; or (2) `merge` a frame such as `data.frame(term=c("corr1","corr2"),newterm=c(toil1,toil2))`. This second option can be done using `base::merge` or the tidyverse `dplyr::left_join` if you're using that (it appears you are). Three options if you want to consider `dplyr::case_when`. – r2evans Jan 26 '21 at 18:16

1 Answers1

0

Three options:

  1. "Merge" mentality. This works very well when you have multiple disparate matches, as it is both efficient for code and easy to visualize and maintain. While the example here only has two replacements, the code doesn't change if corrs_df has 2 rows or 200, and entries in corrs_df that match nothing are silently discarded, doing no harm.

    library(dplyr)
    corrs_df <- data.frame(term = c("corr1", "corr2"), newterm = c("toil1", "toil2"))
    dat %>%
      left_join(corrs_df, by = "term") %>%
      slice(c(1:3, 28:30))
    # # A tibble: 6 x 5
    #    ...1 V1         term  correlation newterm
    #   <dbl> <chr>      <chr>       <dbl> <chr>  
    # 1     1 overstress corr1        0.5  toil1  
    # 2     2 flicker    corr1        0.43 toil1  
    # 3     3 lotteri    corr1        0.42 toil1  
    # 4     4 life       corr3       NA    <NA>   
    # 5     5 charg      corr3       NA    <NA>   
    # 6     6 capac      corr3       NA    <NA>   
    
    dat %>%
      left_join(corrs_df, by = "term") %>%
      mutate(term = coalesce(newterm, term)) %>%
      slice(c(1:3, 28:30))
    # # A tibble: 6 x 5
    #    ...1 V1         term  correlation newterm
    #   <dbl> <chr>      <chr>       <dbl> <chr>  
    # 1     1 overstress toil1        0.5  toil1  
    # 2     2 flicker    toil1        0.43 toil1  
    # 3     3 lotteri    toil1        0.42 toil1  
    # 4     4 life       corr3       NA    <NA>   
    # 5     5 charg      corr3       NA    <NA>   
    # 6     6 capac      corr3       NA    <NA>   
    

    You can obviously %>% select(-newterm).) The coalesce function effectively says "give me the first non-NA value from these variables". The NA in newterm occurs when the associated term variable is not present in corrs_df, which we assume means to make no change.

  2. dplyr::case_when. (If you're into it, then data.table::fcase does effectively the same thing.)

    dat %>%
      mutate(
        term = case_when(
          term == "corr1" ~ "toil1",
          term == "corr2" ~ "toil2",
          TRUE ~ term)
      ) %>%
      slice(c(1:3, 28:30))
    # # A tibble: 6 x 4
    #    ...1 V1         term  correlation
    #   <dbl> <chr>      <chr>       <dbl>
    # 1     1 overstress toil1        0.5 
    # 2     2 flicker    toil1        0.43
    # 3     3 lotteri    toil1        0.42
    # 4     4 life       corr3       NA   
    # 5     5 charg      corr3       NA   
    # 6     6 capac      corr3       NA   
    
  3. Nested ifelse. Actually, since you're using dplyr, it is much better to use if_else for many reasons (e.g., this).

    dat %>%
      mutate(
        term = if_else(term == "corr1", "toil1",
                       if_else(term == "corr2", "toil2", term))
      ) %>%
      slice(c(1:3, 28:30))
    # # A tibble: 6 x 4
    #    ...1 V1         term  correlation
    #   <dbl> <chr>      <chr>       <dbl>
    # 1     1 overstress toil1        0.5 
    # 2     2 flicker    toil1        0.43
    # 3     3 lotteri    toil1        0.42
    # 4     4 life       corr3       NA   
    # 5     5 charg      corr3       NA   
    # 6     6 capac      corr3       NA   
    

    This works fine for 1 or 2 nestings, but in my opinion, it looks messy and it gets difficult to follow; in my experience, because it is harder to follow, it can be harder to maintain, making it quite simple to have incorrect placement of particular options/values. Maintainability and readability are very important in my opinion.

r2evans
  • 141,215
  • 6
  • 77
  • 149