I have the following three data frames:
df1<- structure(list(plot = c(1L, 1L, 1L, 2L, 2L, 2L), lepsp = structure(c(1L,
2L, 3L, 3L, 4L, 5L), .Label = c("lepA", "lepB", "lepC", "lepD",
"lepE"), class = "factor"), count = c(1L, 2L, 3L, 4L, 1L, 3L)), class = "data.frame",
row.names = c(NA, -6L))
df2<-structure(list(plot = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L), plantsp = structure(c(12L, 13L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 11L), .Label = c("H",
"I", "J", "K", "L", "M", "O", "P", "Q", "S", "U", "X", "Y", "Z"
), class = "factor"), leafArea = c(1L, 5L, 5L, 10L, 20L, 11L,
12L, 8L, 1L, 5L, 10L, 15L, 20L, 12L, 13L, 2L)), class = "data.frame", row.names = c(NA,
-16L))
df3<-structure(list(lepsp = structure(c(1L, 1L, 1L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 5L), .Label = c("lepA", "lepB", "lepC", "lepD",
"lepE"), class = "factor"), plantsp = structure(c(6L, 7L, 8L,
6L, 5L, 2L, 3L, 4L, 1L, 6L, 8L, 8L, 1L), .Label = c("S", "T",
"U", "V", "W", "X", "Y", "Z"), class = "factor")), class = "data.frame", row.names = c(NA,
-13L))
Essentially I need to iterate through df1
for unique subsets based on two factor levels. Upon each iteration, I need to find matches between df1
and df2
for a particular column. Among the matches found between df2
and df1
, I need to take that subset of rows in df2
and find matches with df3
based on a separate set of criteria and return rows that match for a different factor. To summarize, specific to the data frames posted above:
- For each ith
df1$plot
and jthdf1$lepsp
, subset rows indf2
for those entries indf1$plot
anddf2$plot
that match. Similarly and secondly, subset rows indf3
for those entries indf1$lepsp
anddf3$lepsp
that match. - Among the subsetted
df2
anddf3
from step 1 above, for those levels ofdf3$plantsp
that are also indf2$plantsp
return the matching rows indf2
. - Return a data frame that indexes the associated ith
df1$plot
and jthdf1$lepsp
and the associated rows indf2
that matched based on criteria found in step2 - Iterate through all levels of
df1$lepsp
within eachdf1$plot
.
The result output would be as follows:
result<- structure(list(plot = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L
), lepsp = structure(c(1L, 1L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 5L
), .Label = c("lepA", "lepB", "lepC", "lepD", "lepE"), class = "factor"),
lepcount = c(1L, 1L, 2L, 3L, 4L, 4L, 4L, 4L, 1L, 3L), plantsp = structure(c(3L,
4L, 3L, 3L, 2L, 1L, 3L, 5L, 5L, 1L), .Label = c("S", "U",
"X", "Y", "Z"), class = "factor"), leafarea = c(1L, 5L, 1L,
1L, 2L, 15L, 20L, 13L, 13L, 15L)), class = "data.frame", row.names = c(NA,
-10L))
Given that this has such nested structure I struggle to think of a method that would work to put all parts together, however I know that the following functions may be useful:
for (i in unique(levels(df1$plot)){
for( j in unique(levels(df1$lepsp)){
sub1<- df2[which(df1$plot %in% df2$plot),]
sub2<- df3[which(df2$lepsp %in% df3$lepsp),]
result <- data.frame(plot=unique(df1$plot),lepsp=unique(df1$lepsp),
plantsp=df2$plantsp,leafArea=df2$leafArea)}
return(result)
}