Creating new column with dplyr::if_else condition in R

Question

I have written the following code to create a new column based on certain conditions on the previous column. I believe I need to add an additional condition to retain the previous values.

BP2018_spread <- BP2018_spread  %>%
 dplyr :: mutate(Period_Type = if_else(grepl("QTD",BP2018_spread$Month), 'QTD', if_else(grepl("YTD",BP2018_spread$Month),'YTD', 'NA')))

Current Output:
       Month      Period_Type
       <chr>      <chr>      
     1 " Apr"     NA         
     2 " Apr QTD" QTD        
     3 " Apr YTD" YTD        
     4 " Aug"     NA         
     5 " Aug QTD" QTD        
     6 " Aug YTD" YTD        
     7 " Dec"     NA         
     8 " Dec QTD" QTD        
     9 " Dec YTD" YTD        
    10 " Feb"     NA  


Desired Output:
   Month      Period_Type
       <chr>      <chr>      
     1 " Apr"     Apr         
     2 " Apr QTD" QTD        
     3 " Apr YTD" YTD        
     4 " Aug"     Aug         
     5 " Aug QTD" QTD        
     6 " Aug YTD" YTD        
     7 " Dec"     Dec         
     8 " Dec QTD" QTD        
     9 " Dec YTD" YTD        
    10 " Feb"     Feb

OR

Desired Output:
       Month      Period_Type
           <chr>      <chr>      
         1 " Apr"     MTD         
         2 " Apr QTD" QTD        
         3 " Apr YTD" YTD        
         4 " Aug"     MTD         
         5 " Aug QTD" QTD        
         6 " Aug YTD" YTD        
         7 " Dec"     MTD         
         8 " Dec QTD" QTD        
         9 " Dec YTD" YTD        
        10 " Feb"     MTD

Could you post the input data; what is BP2018_spread before the mutation? — CT Hall, Mar 14 '19 at 16:50
Hi CT, is it necessary to do so? Its a conditional if-else mutation? — Matthew Loh, Mar 14 '19 at 17:07
1) Don't use `data$column` inside dplyr verbs, just use the column name. 2) Don't use `"NA"` in quotes, just use `NA`. 3) If you want the original value where you're getting `NA`, in your code replace `NA` with the original value (`Month`), or with `"MTD"` if that's what you would prefer. — Gregor Thomas, Mar 14 '19 at 17:16
Re "is it necessary?" It is a courtesy. If you don't, it creates extra work (and guesswork) for answerers, eg as seen in FALL Gora's answer below. See https://stackoverflow.com/a/28481250 for some advice re posting R questions if interested. — Frank, Mar 14 '19 at 17:20
Adding on to Frank's points: (a) it makes it *much nicer* for answers. This is a simple question, and you got an answer in just over 30 minutes, which is pretty quick. I'd bet if you had posted the question with data to start, you would have had an answer in less than 10 minutes. (b) Especially for questions that are more complicated and take time to answer, multiple people will probably be working on a solution. They will all have to go to the bother of creating your data, unless you do it for them. Much more efficient to do it once. — Gregor Thomas, Mar 14 '19 at 17:25
(c) It prevents misunderstanding. FALL Gora's answer has a bunch of `as.character()` calls that you probably don't need, but their answer used `factor` columns where your data is `character`. If you provide your data, it avoids ambiguity like that. — Gregor Thomas, Mar 14 '19 at 17:25
Thanks guys for the advice. I will take note of this in future when posting questions! — Matthew Loh, Mar 14 '19 at 17:26
And just looking at the answers, looks like CT Hall and FALL Gora imported your data similarly, but a little different with `tibble`. Hunaidkhan saved it as a CSV and read in the file. newLearner answered early enough they probably didn't test at all. Jet did yet another way, perhaps copy/pasting from one of the `tibble` users, but perhaps not. You could have saved all of these people a minute or two each by taking the time yourself. You seem convinced---I don't mean to harp on this---but it's a very nice thing to do. — Gregor Thomas, Mar 14 '19 at 17:29

score 2 · Accepted Answer · answered Mar 14 '19 at 17:16

2

Can you try

    BP2018_spread <- BP2018_spread  %>%
                     dplyr::mutate(Period_Type = ifelse(grepl("QTD", Month), 'QTD',
                     ifelse(grepl("YTD", Month),'YTD', 'MTD')))

answered Mar 14 '19 at 17:16

cropgen

1,920
15
24

Jet · Answer 2 · 2019-04-08T15:34:25.443

2

Use case_when, which allows to avoid writing multiple encapsulated if_else statements :

BP2018_spread <- data.frame(
  Month = c(
    " Apr",
    " Apr QTD",
    " Apr YTD",
    " Aug",
    " Aug QTD",
    " Aug YTD",
    " Dec",
    " Dec QTD",
    " Dec YTD",
    " Feb"
  ))

BP2018_spread %>% 
mutate(Period_Type=case_when(grepl("QTD",BP2018_spread$Month) ~ 'QTD',
                             grepl("YTD",BP2018_spread$Month) ~ 'YTD',
                             TRUE ~ as.character(Month)))

Output :

      Month Period_Type
1       Apr         Apr
2   Apr QTD         QTD
3   Apr YTD         YTD
4       Aug         Aug
5   Aug QTD         QTD
6   Aug YTD         YTD
7       Dec         Dec
8   Dec QTD         QTD
9   Dec YTD         YTD
10      Feb         Feb

edited Apr 08 '19 at 15:34

answered Mar 14 '19 at 17:27

Jet

650
5
17

Point of clarification: `if_else` and `ifelse` *are* vectorized--that's how they differ from `if ... else` structures. `case_when` is also vectorized, and is a substitute from multiple, nested `ifelse`/`if_else` calls – camille Apr 07 '19 at 16:27
You are right, I edited. From the documentation of `case_when` that I linked, it allows to *"vectorise multiple if_else statements"*. I indeed meant that it allows to avoid using several encapsulated `if_else` statements. – Jet Apr 08 '19 at 15:32

score 1 · Answer 3 · edited Mar 14 '19 at 17:17

I think this can help:

BP2018_spread %>% 
  mutate(Period_Type = if_else(grepl("QTD",Period_Type) | grepl("YTD",Period_Type),
                               as.character(Period_Type),
                              as.character(Month) ))

Using this data:

#reproduce your dataset
BP2018_spread <- data.frame(
  Month = c(
    " Apr",
    " Apr QTD",
    " Apr YTD",
    " Aug",
    " Aug QTD",
    " Aug YTD",
    " Dec",
    " Dec QTD",
    " Dec YTD",
    " Feb"
  ),
  Period_Type = c(NA,
                  "QTD",
                  "YTD",
                  NA,
                  "QTD",
                  "YTD",
                  NA,
                  "QTD",
                  "YTD",
                  NA)
)

I didn't make this change, but we can see from the `` in OP's data printout that they are starting with `character` data. If you set `stringsAsFactors = FALSE` in your `data.frame` call, you will replicate OP's data better and you can get rid of all the `as.character()`s in your solution. — Gregor Thomas, Mar 14 '19 at 17:19

score 0 · Answer 4 · answered Mar 14 '19 at 17:19

0

You can try this method also

df <- read.csv("D:/X-tras/stackoverflow/STACK_YTD.csv",header = T)

v<- c("QTD","YTD")

df <- df %>%
  mutate(final = ifelse(grepl("YTD|QTD",Period_Type), str_extract(Period_Type, str_c(v, collapse = "|")), "MTD"))

answered Mar 14 '19 at 17:19

Hunaidkhan

1,411
2
11
21

score 0 · Answer 5 · answered Mar 14 '19 at 17:23

Input

BP2018_spread <- tibble(
  Month = c(
    " Apr",
    " Apr QTD",
    " Apr YTD",
    " Aug",
    " Aug QTD",
    " Aug YTD",
    " Dec",
    " Dec QTD",
    " Dec YTD",
    " Feb"
  )
)

Output 1

BP2018_spread %>% 
  mutate(Period_Type = if_else(
    grepl("QTD", Month),'QTD',
  if_else(grepl("YTD", Month),'YTD',
    trimws(Month)
  )))

OR

BP2018_spread %>%
  mutate(Period_Type = case_when(
    grepl("QTD", Month) ~ 'QTD',
    grepl("YTD", Month) ~ 'YTD',
    TRUE ~ trimws(Month)
  ))

Output 2

BP2018_spread %>% 
  mutate(Period_Type = if_else(
    grepl("QTD", Month),'QTD',
  if_else(grepl("YTD", Month),'YTD',
    'MTD'
  )))

OR

BP2018_spread %>%
  mutate(Period_Type = case_when(
    grepl("QTD", Month) ~ 'QTD',
    grepl("YTD", Month) ~ 'YTD',
    TRUE ~ 'MTD'
  ))

In both cases I used trimws to trim whitespace, also you don't need BP2018_spread$ as mutate understands column names. The second option of code under each output uses case_when which is easier to write and understand.

Creating new column with dplyr::if_else condition in R

5 Answers5

Input

Output 1

Output 2