3

Thanks for your help.

I have two data frames. The data frames are of differing lengths. One is a data set that often includes mistakes. Another is a set of corrections. I'm trying to do two things at once with these two data sets. First, I would like to compare three columns of df1 with three columns in df2. This means reading the first row of data in df1 and seeing if those three variables match any of the rows in df2 for those three variables, then moving on to row 2, and so on. If a match is found in a row for all three variables, then replace the value in one of the columns in df1 with a replacement in df2. I have included an example below.

df1 <- data.frame("FIRM" = c("A", "A", "B", "B", "C", "C"), "LOCATION" = c("N", "S", "N", "S", "N", "S"), "NAME" = c("Apple", "Blooberry", "Cucumber", "Date", "Egplant", "Fig"))
df2 <- data.frame("FIRM" = c("A", "C"), "LOCATION" = c("S", "N"), "NAME" = c("Blooberry", "Egplant"), "NEW_NAME" = c("Blueberry", "Eggplant"))
df1[] <- lapply(df1, as.character)
df2[] <- lapply(df2, as.character)

If there is a row in df1 that matches against "FIRM", "LOCATION" and "NAME" in df2, then I would like to replace the "NAME" in df1 with "NEW_NAME" in df2, such that "Blooberry" and "Egplant" change to "Blueberry" and "Eggplant".

I can do the final replacements using*:

df1$NAME[match(df2$NAME, df1$NAME)] <- df2$NEW_NAME[match(df1$NAME[match(df2$NAME, df1$NAME)], df2$NAME)]

But this does not include the constraint of the three matches. Also, my code seems unnecessarily complex with the nested match functions. I think I could accomplish this task by subsetting df2 and using a for loop to match rows one by one but I would think that there is a better vectorized method out there.

*I'm aware that inside the brackets of df2$NEW_NAME[], the function calls both elements in that column, but I'm trying to generalize.

hmhensen
  • 2,974
  • 3
  • 22
  • 43
  • By the way, I have actually gotten around this in the past by pasting the three columns together in both data frames and matching the single column only, but I would really like to find a solution for the problem above. – hmhensen Sep 26 '17 at 21:14
  • 1
    If the core of your question is how to match on multiple columns, that has been asked on this site before: https://stackoverflow.com/q/6880450/3093387. You could just match on the interaction of the relevant variables. – josliber Sep 26 '17 at 21:15
  • 1
    @Frank Thanks. I usually do that when importing data but didn't realize you could do that in the `data.frame()` command. – hmhensen Sep 26 '17 at 21:18
  • @josliber It's to match on multiple columns and update one table, similar to https://stackoverflow.com/q/6112260/ I think. C8H's answer extends to multiple columns there, at least. Without the update aspect, it's just generic join FAQ: https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – Frank Sep 26 '17 at 21:19

1 Answers1

3

Consider an all.x merge (i.e., LEFT JOIN in SQL speak) with an ifelse conditional comparing NAME and NEW_NAME.

Below, transform allows same line column assignment and the bracketed sequence at end keeps first three columns.

mdf <- transform(merge(df1,df2,all.x=TRUE),NAME=ifelse(is.na(NEW_NAME),NAME,NEW_NAME))[1:3]
mdf
#   FIRM LOCATION      NAME
# 1    A        N     Apple
# 2    A        S Blueberry
# 3    B        N  Cucumber
# 4    B        S      Date
# 5    C        N  Eggplant
# 6    C        S       Fig
Parfait
  • 104,375
  • 17
  • 94
  • 125
  • 3
    Instead of `1:3`, I'd do `names(df1)` at the end. – Frank Sep 26 '17 at 21:14
  • It took me a bit more of researching your answer to see that it actually did everything I asked. I've never used a SQL database so I am not familiar with the intricacies of database joins and did not realize that is what `merge()` actually does. So it appears that the merge function itself does the matching for all same-named columns all on its own! That's a revelation to me. Thanks! – hmhensen Sep 26 '17 at 22:43
  • Haha...Awesome! Glad to help with solution and eureka moment. Happy coding! – Parfait Sep 27 '17 at 00:26