Thanks for your help.
I have two data frames. The data frames are of differing lengths. One is a data set that often includes mistakes. Another is a set of corrections. I'm trying to do two things at once with these two data sets. First, I would like to compare three columns of df1
with three columns in df2
. This means reading the first row of data in df1
and seeing if those three variables match any of the rows in df2
for those three variables, then moving on to row 2, and so on. If a match is found in a row for all three variables, then replace the value in one of the columns in df1
with a replacement in df2
. I have included an example below.
df1 <- data.frame("FIRM" = c("A", "A", "B", "B", "C", "C"), "LOCATION" = c("N", "S", "N", "S", "N", "S"), "NAME" = c("Apple", "Blooberry", "Cucumber", "Date", "Egplant", "Fig"))
df2 <- data.frame("FIRM" = c("A", "C"), "LOCATION" = c("S", "N"), "NAME" = c("Blooberry", "Egplant"), "NEW_NAME" = c("Blueberry", "Eggplant"))
df1[] <- lapply(df1, as.character)
df2[] <- lapply(df2, as.character)
If there is a row in df1
that matches against "FIRM", "LOCATION" and "NAME" in df2
, then I would like to replace the "NAME" in df1
with "NEW_NAME" in df2
, such that "Blooberry" and "Egplant" change to "Blueberry" and "Eggplant".
I can do the final replacements using*:
df1$NAME[match(df2$NAME, df1$NAME)] <- df2$NEW_NAME[match(df1$NAME[match(df2$NAME, df1$NAME)], df2$NAME)]
But this does not include the constraint of the three matches. Also, my code seems unnecessarily complex with the nested match functions. I think I could accomplish this task by subsetting df2
and using a for
loop to match rows one by one but I would think that there is a better vectorized method out there.
*I'm aware that inside the brackets of df2$NEW_NAME[]
, the function calls both elements in that column, but I'm trying to generalize.