0

I have the following dataframe

library(tidyverse)
x <- c(1,2,3,NA,NA,4,5)
y <- c(1,2,3,5,5,4,5)
z <- c(1,1,1,6,7,7,8)
df <- data.frame(x,y,z)

df
   x y z
1  1 1 1
2  2 2 1
3  3 3 1
4 NA 5 6
5 NA 5 7
6  4 4 7
7  5 5 8

I would like to update the dataframe according to the following conditions

  • If z==1, update to x=1, else leave the current value for x
  • If z==1, update to y=2, else leave the current value for y

The following code does the job fine

df %>% mutate(x=if_else(z==1,1,x),y=if_else(z==1,2,y))
   x y z
1  1 2 1
2  1 2 1
3  1 2 1
4 NA 5 6
5 NA 5 7
6  4 4 7
7  5 5 8

However, I have to add if_else statement for x and y mutate functions. This has the potential to make my code complicated and hard to read. To give you a SQL analogy, consider the following code

UPDATE df
SET x= 1, y= 2
WHERE z = 1;

I would like to achieve the following:

  • Specify the update condition ahead of time, so I don't have to repeat it for every mutate function
  • I would like to avoid using data.table or base R. I am using dplyr so I would like to stick to it for consistency
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
RDiesel
  • 21
  • 4

4 Answers4

0

Here is one option with map2. Loop through the 'x', 'y' columns of the dataset, along with the values to change, apply case_when based on the values of 'z' if it is TRUE, then return the new value, or else return the same column and bind the columns with the original dataset

library(dplyr)
library(purrr)
map2_df(df %>%
          select(x, y), c(1, 2), ~ case_when(df$z == 1 ~ .y, TRUE ~ .x)) %>%
     bind_cols(df %>%
                 select(z), .) %>%
     select(names(df))

Or using base R, create a logical vector, use that to subset the rows of columns 'x', 'y' and update by assigning to a list of values

i1 <- df$z == 1
df[i1, c('x', 'y')] <- list(1, 2)
df
#   x y z
#1  1 2 1
#2  1 2 1
#3  1 2 1
#4 NA 5 6
#5 NA 5 7
#6  4 4 7
#7  5 5 8

The advantage of both the solutions are that we can pass n number of columns with corresponding values to pass and not repeating the code

akrun
  • 874,273
  • 37
  • 540
  • 662
0
library(dplyr)
df %>%
    mutate(x = replace(x, z == 1, 1),
           y = replace(y, z == 1, 2))
#   x y z
#1  1 2 1
#2  1 2 1
#3  1 2 1
#4 NA 5 6
#5 NA 5 7
#6  4 4 7
#7  5 5 8

In base R

transform(df,
          x = replace(x, z == 1, 1),
          y = replace(y, z == 1, 2))

If you store the condition in a variable, you don't have to type it multiple times

condn = (df$z == 1)
transform(df,
          x = replace(x, condn, 1),
          y = replace(y, condn, 2))
Telepresence
  • 619
  • 2
  • 7
  • 22
0

If you have an SQL background, you should really check out :

library(data.table)
dt <- as.data.table(df)

set(dt, which(z == 1), c('x', 'y'), list(1, 2))
dt

# or perhaps more classic syntax

dt <- as.data.table(df)
dt
#    x y z
#1:  1 1 1
#2:  2 2 1
#3:  3 3 1
#4: NA 5 6
#5: NA 5 7
#6:  4 4 7
#7:  5 5 8

dt[z == 1, `:=`(x = 1, y = 2)]
dt

#    x y z
#1:  1 2 1
#2:  1 2 1
#3:  1 2 1
#4: NA 5 6
#5: NA 5 7
#6:  4 4 7
#7:  5 5 8

The last option is an update join. This is great if you have the lookup data already done upfront:

# update join:
dt <- as.data.table(df)
dt_lookup <- data.table(x = 1, y = 2, z = 1)

dt[dt_lookup, on = .(z), `:=`(x = i.x, y = i.y)]
dt
Cole
  • 11,130
  • 1
  • 9
  • 24
0

Using mutate_cond posted at dplyr mutate/replace several columns on a subset of rows we can do this:

df %>% mutate_cond(z == 1, x = 1, y = 2)

giving:

   x y z
1  1 2 1
2  1 2 1
3  1 2 1
4 NA 5 6
5 NA 5 7
6  4 4 7
7  5 5 8

sqldf

Of course you can directly implement it in SQL with sqldf -- ignore the warning message that the backend RSQLite issues.

library(sqldf)
sqldf(c("update df set x = 1, y = 2 where z = 1", "select * from df"))

base R

It straight-forward in base R:

df[df$z == 1, c("x", "y")] <- list(1, 2)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341