Suppose I have the following data.frames:
library(dplyr)
set.seed(13)
df <- data_frame(A = sample(letters[1:2], 6, rep=TRUE), B = sample(1:3, 6, rep = TRUE))
new_df <- data_frame(A ="a", B = 4)
Suppose I want to update all the rows of df
where A == "a"
with the value 4
(This is an example, in general df
has more than one row). I can do this the following way:
df %>% left_join(new_df %>% rename(b=B)) %>% mutate(B = ifelse(is.na(b), B, b))
Which is fine, but this does not look elegant. Is there a better way to do this?
I came across this issue by cleaning up the data. I calculate certain column from another column, which should be unique id, but due to data collection issues it is not. I have another table with the correct ids, and I want to update them. Usually the number of incorrect ids is low compared to number of correct ids, so doing join seems like an overkill.