I have two dataframes, one large, one small, with the columns only partially shared:
df1 <- data.frame(
Utt = c("xyzxyz", "hi their", "how ae you?", "xxxxx", "yyzyzyz", "hybsfc"),
File = c("F01", "F02", "F02", "F03", "F03", "F12"),
x = 1:6,
y = LETTERS[1:6],
z = rnorm(6)
)
df2 <- data.frame(
Utt = c("hi there", "how are you?"),
File = c("F02", "F02")
)
Column Utt
in df1
contains corrupted data for File == "F02"
(the rest of the data in that column is okay). I want to replace the corrupted data with cleaned-up data from column Utt
in df2
. How can that be done efficiently in dplyr
?
A less-than-efficient method is by filter
ing df1
for File == "F02"
and mutate
ing Utt
with input from the respective column in df2$Utt
:
library(dplyr)
df1 %>%
filter(File == "F02") %>%
mutate(Utt = df2$Utt)
It's not efficient because the mutated df1
needs to be joined appropriately with the old df1
to obtain the desired result:
Utt File x y z
1 xyzxyz F01 1 A -2.5514777
2 hi there F02 2 B -2.7582295
3 how are you? F02 3 C 2.1081157
4 xxxxx F03 4 D 0.1628507
5 yyzyzyz F03 5 E -1.1904290
6 hybsfc F12 6 F -1.1244349