0

I have two data frames that are of the same length that share 2 columns with the same data ('CODE' and 'TYPE'). What I need to do is match each unique combination of 'CODE' and 'TYPE' found in DF1 and DF2 and assign the unique ID's from the resulting match from DF2 to DF1.

My current solution uses a loop function and while it does work it is quite slow for the volume of data that I need to use this with. Does anyone know how the same result can be procuded but quicker? The example below is a neater version of what my actual data looks like. Thank you.

EDIT: In the example DF1 contains 50 CODE 'M001' of which 10 are assigned as TYPE 1. In DF2 there are also 10 TYPE 1's for CODE 'M001' but each of these has a unique ID. This means each of these 10 unique ID's are equally valid when being matched based on the 'CODE' and 'TYPE' values. This means when you use merge() it returns 10 results for each row in DF1 when I am only after 1. The solutions from How to join (merge) data frames (inner, outer, left, right)? do not appear to solve this problem (unless I missed it).

DF1 = data.frame(matrix("", ncol = 5, nrow = 100))
DF1$X1 <- rep(c("K001","K002","K003","K004"), each=25)
DF1$X2 <- rep(c("M001","M002"), each=50)
DF1$X3 <- rep(c(1,3,5,8,10,1,3,5,8,10), each=10)
DF1$X4 <- 0
DF1$X5 <- 1:100
colnames(DF1) <- c("AREA", "CODE", "TYPE", "ID","AREA_ID")

DF2 = data.frame(matrix("", ncol = 3, nrow = 100))
DF2$X1 <- 1:100
DF2$X1 <- sprintf('G%i',DF2$X1)
DF2$X2 <- rep(c(1,3,5,8,10,1,3,5,8,10), each=10)
DF2$X3 <- rep(c("M001","M002"), each=50)
colnames(DF2) <- c("ID", "TYPE", "CODE")
DF2 <- DF2[sample(nrow(DF2)),]

for (a in c(1,3,5,8,10)){
for (b in 1:length(unique(DF1$CODE))){
LDF1 <- with(DF1, DF1[CODE == unique(DF1$CODE)[b] & TYPE == a,])
LDF2 <- with(DF2, DF2[CODE == unique(DF2$CODE)[b] & TYPE == a,])
LDF1$ID <- as.character(LDF2$ID)
DF1$ID[match(DF1$AREA_ID, LDF1$AREA_ID, nomatch = 0) != 0] <- LDF1$ID[match(DF1$AREA_ID, LDF1$AREA_ID, nomatch = 0)]
}
}
Community
  • 1
  • 1
Chris
  • 1,197
  • 9
  • 28
  • maybe just `merge(df1, df2, by=c("code", "type"), all.x=TRUE)` or similar. – lmo Jan 26 '17 at 12:39
  • Thanks, but the output of this is a 1000 row data frame as for each combination of CODE and TYPE there are 10 unique ID's in the example I have given. I am after a solution that offers 1 unique ID for per combination row – Chris Jan 26 '17 at 14:16

0 Answers0