5

Say I have the following dataframes:

DF1 <- data.frame("A" = rep(c("A","B"), 18),
                  "B" = rep(c("C","D","E"), 12),
                  "NUM"= rep(rnorm(36,10,1)),
                  "TEST" = rep(NA,36))

DF2 <- data.frame("A" = rep("A",6),
                  "B" = rep(c("C","D"),6),
                  "VAL" = rep(c(1,3),3))

*Note: Each unique combination of variables A and B in DF2 should have a unique VAL.

For each row, I would like to replace the NA in TEST with the corresponding value of VAL in DF1 if the values in columns A and A match and the values in columns B and B match for that row. Otherwise, I'd leave TEST as NA. How would I do this without looping through each combination using match?

Ideally, an answer would scale to two data frames with many columns to match upon.

Jaap
  • 81,064
  • 34
  • 182
  • 193
goldisfine
  • 4,742
  • 11
  • 59
  • 83
  • 2
    I think in your DF2 there are multiple rows that could match up to DF1 1st and 2nd columns. So, I am not sure which VAL should be taken. Also, based on the example, the dimensions of the two datasets are not the same. – akrun Mar 23 '15 at 14:26
  • Sorry, each unique combination in `DF2` should have a unique `VAL`. I need to figure out a way to encode that in the example. – goldisfine Mar 23 '15 at 14:29
  • 2
    In that case, there would be only two rows for DF2 i.e. `DF2 <- data.frame(A=rep('A',2), B=c('C', 'D'), VAL=rnorm(2))` and `merge` will get the result `merge(DF1, DF2, all=TRUE)` – akrun Mar 23 '15 at 14:30
  • 1
    @goldisfine: That doesn't clarify things much. We need to know _which_ `VAL` should be chosen for a given combination of `A` and `B` since there are multiple. – Alex A. Mar 23 '15 at 14:31
  • Now there is a unique value for each combination in `DF2` so I think that there is only one possible option for `VAL` for each row in `DF1`. I guess `DF2` could be reduced to only the unique combinations of the predictors. – goldisfine Mar 23 '15 at 14:34
  • In Python pandas, you could just simply set a compound key using variables A,B. Then this is a simple left-join/merge. – smci Mar 24 '15 at 03:07

2 Answers2

11
# this is your DF1    
DF1 <- data.frame("A" = rep(c("A","B"), 18),
                      "B" = rep(c("C","D","E"), 12),
                      "NUM"= rep(rnorm(36,10,1)),
                      "TEST" = rep(NA,36))

#this is a DF2 i created, with unique A, B, VAL
DF2 <- data.frame("A" = rep(c("A","B"),3),
                  "B" = rep(c("C","D","E"),2),
                  "VAL" = rep(1:6))

# and this is the answer of what i assume you want      
tmp <- merge(DF1,DF2, by=c("A","B"), all.x=TRUE, all.y=FALSE)
DF1[4] <- tmp[5]
RHA
  • 3,677
  • 4
  • 25
  • 48
6

As Akrun mentioned in comments, your lookup table (DF2) needs to be reduced to just its unique A/B combinations. For your current dataframe, this isn't a problem, but you will need additional rules if there are multiple possible values for the same combination. From there, the solution is easy:

DF2.u <- unique(DF2)
DF3 <- merge(DF1, DF2.u, all = T)

Note that this will produce a new dataframe with an empty TEST column (all values NA), and a VAL column assigned from DF2. To do exactly what you wanted (replace TEST with VAL where possible), here is some slightly clunkier code:

DF1$TEST <- merge(DF1, DF2.u, all = T)$VAL

EDIT: in response to your question, you can boil down DF2 if necessary quite simple:

DF2$C <- c(1:12) #now unique() won't work
DF2.u <- unique(DF2[1:3])

 A B VAL
1 A C   1
2 A D   3
Joe
  • 3,831
  • 4
  • 28
  • 44
  • In my actual example, I have columns in DF2 which are numeric, and so if I passed the entire DF then I would have a unique DF that was too large. Can I subset DF2 to rows which are only unique in the factor variables? – goldisfine Mar 23 '15 at 14:56
  • Of course, the actual indices you take will vary based on what your extra columns are and how they are arranged – Joe Mar 23 '15 at 15:02
  • 1
    In Python pandas, you could just simply set a compound key using variables A,B. Then this is a simple left-join/merge. – smci Mar 24 '15 at 03:06