I have two dataframes in R - TABLEA (~6,000 rows) and TABLEB (~30,000 rows). All columns are character format.
TABLEA: COL1 - COL2 - COL3 - COL4
TABLEB: COL1 - COL2 - COL3 - COL4 - COL5 - COL6 - COL7
I would like to add a 5th column to TABLEA by taking the values of COL1-COL4 for each row in TABLEA and performing a regex match on TABLEB for the same columns. The value of COL5 from TABLEB should then be added to the appropriate row of TABLEA at COL5.
I have written a function to do this and tried a fop-loop and mapply technique but it is taking a very long time to run. I suspect that my function to perform the lookup is terribly inefficient and that there must be a better way to do this.
Any helpful suggestions or tips on how I can achieve this?
getValue<-function(input1, input2, input3, input4)
{
#filter TABLEB on 4 columns matching the 4 inputs above.
FILTERED <- TABLEB %>% filter(str_detect(input1, COL1)) %>% filter(str_detect(input2, COL2)) %>%
filter(str_detect(input3, COL3)) %>% filter(str_detect(input4, COL4)) %>%
#select the 1st value from column 5, in case of multiple matches
select(COL5) %>% head(1)
VALUE<-as.character(FILTERED$COL5)
#if no match found
if(length(VALUE) == 0) VALUE = "UNKNOWN"
return(VALUE)
}
#for-loop method
for(i in 1:nrow(TABLEA)) {
TABLEA[i, ]$COL5<-getValue(TABLEA[i, ]$COL1, TABLEA[i, ]$COL2, TABLEA[i, ]$COL3, TABLEA[i, ]$COL4)
#mapply method
TABLEA$COL5<-mapply(getValue, TABLEA$COL1, TABLEA$COL2, TABLEA$COL3, TABLEA$COL4)
}