Iteratively return matches among three data frames using nested criteria

Question

I have the following three data frames:

df1<- structure(list(plot = c(1L, 1L, 1L, 2L, 2L, 2L), lepsp = structure(c(1L, 
2L, 3L, 3L, 4L, 5L), .Label = c("lepA", "lepB", "lepC", "lepD", 
"lepE"), class = "factor"), count = c(1L, 2L, 3L, 4L, 1L, 3L)), class = "data.frame", 
row.names = c(NA, -6L))

df2<-structure(list(plot = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L), plantsp = structure(c(12L, 13L, 1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 11L), .Label = c("H", 
"I", "J", "K", "L", "M", "O", "P", "Q", "S", "U", "X", "Y", "Z"
), class = "factor"), leafArea = c(1L, 5L, 5L, 10L, 20L, 11L, 
12L, 8L, 1L, 5L, 10L, 15L, 20L, 12L, 13L, 2L)), class = "data.frame", row.names = c(NA, 
-16L))

df3<-structure(list(lepsp = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 4L, 5L), .Label = c("lepA", "lepB", "lepC", "lepD", 
"lepE"), class = "factor"), plantsp = structure(c(6L, 7L, 8L, 
6L, 5L, 2L, 3L, 4L, 1L, 6L, 8L, 8L, 1L), .Label = c("S", "T", 
"U", "V", "W", "X", "Y", "Z"), class = "factor")), class = "data.frame", row.names = c(NA, 
-13L))

Essentially I need to iterate through df1 for unique subsets based on two factor levels. Upon each iteration, I need to find matches between df1 and df2 for a particular column. Among the matches found between df2 and df1, I need to take that subset of rows in df2 and find matches with df3 based on a separate set of criteria and return rows that match for a different factor. To summarize, specific to the data frames posted above:

For each ith df1$plot and jth df1$lepsp, subset rows in df2 for those entries in df1$plot and df2$plot that match. Similarly and secondly, subset rows in df3 for those entries in df1$lepsp and df3$lepsp that match.
Among the subsetted df2 and df3 from step 1 above, for those levels of df3$plantsp that are also in df2$plantsp return the matching rows in df2.
Return a data frame that indexes the associated ith df1$plot and jth df1$lepsp and the associated rows in df2 that matched based on criteria found in step2
Iterate through all levels of df1$lepsp within each df1$plot.

The result output would be as follows:

result<- structure(list(plot = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L
), lepsp = structure(c(1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 5L
), .Label = c("lepA", "lepB", "lepC", "lepD", "lepE"), class = "factor"), 
lepcount = c(1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L, 1L, 3L), plantsp = structure(c(3L, 
4L, 3L, 3L, 2L, 1L, 3L, 5L, 5L, 1L), .Label = c("S", "U", 
"X", "Y", "Z"), class = "factor"), leafarea = c(1L, 5L, 1L, 
1L, 2L, 15L, 20L, 13L, 13L, 15L)), class = "data.frame", row.names = c(NA, 
-10L))

Given that this has such nested structure I struggle to think of a method that would work to put all parts together, however I know that the following functions may be useful:

for (i in unique(levels(df1$plot)){
 for( j in  unique(levels(df1$lepsp)){
 sub1<- df2[which(df1$plot %in% df2$plot),]
 sub2<- df3[which(df2$lepsp %in% df3$lepsp),]
 result <- data.frame(plot=unique(df1$plot),lepsp=unique(df1$lepsp),
                   plantsp=df2$plantsp,leafArea=df2$leafArea)}
 return(result)
}

score 1 · Accepted Answer · answered May 01 '20 at 21:57

1

We could keep the datasets in a list and use merge with Reduce

out <- Reduce(function(...) merge(...), list(df1, df2, df3))

Or with tidyverse

library(dplyr)
library(purrr)
list(df1, df2, df3) %>%
      reduce(inner_join)

answered May 01 '20 at 21:57

akrun

874,273
37
540
662

2

Voted for your solution! I guess a bit shorter base R version is `Reduce(merge,list(df1,df2,df3))` :P – ThomasIsCoding May 01 '20 at 22:29

Iteratively return matches among three data frames using nested criteria

1 Answers1