I have very big reference file with thousands of pairwise comparisons between thousands of objects ("OTUs). The dataframe is in long format:
data.frame': 14845516 obs. of 3 variables:
$ OTU1 : chr "0" "0" "0" "0" ...
$ OTU2 : chr "8192" "1" "8194" "3" ...
$ gendist: num 78.7 77.8 77.6 74.4 75.3 ...
I also have a much smaller subset with observed data (slightly different structure):
'data.frame': 286903 obs. of 3 variables:
$ OTU1 : chr "1239" "1603" "2584" "1120" ...
$ OTU2 : chr "12136" "12136" "12136" "12136" ...
$ ecodist: num 2.08 1.85 2 1.73 1.53 ...
- attr(*, "na.action")=Class 'omit' Named int [1:287661] 1 759 760 1517 1518 1519 2275 2276 2277 2278 ...
.. ..- attr(*, "names")= chr [1:287661] "1" "759" "760" "1517" ...
Again, its a pairwise comparison of objects ('OTUs'). All objects in the smaller dataset are also in the reference dataset.
I want to reduce the reference that it only contains objects that are also found in the smaller dataset. It is very important that its done on both columns (OTU1, OTU2).
Here is toy data:
library(reshape)
###reference
Ref <- cor(as.data.frame(matrix(rnorm(100),10,10)))
row.names(Ref) <- colnames(Ref) <- LETTERS[1:10]
Ref[upper.tri(Ref)] <- NA
diag(Ref) <- NA
Ref.m <- na.omit(melt(Ref, varnames = c('row', 'col')))
###query
tmp <- cor(as.data.frame(matrix(rnorm(25),5,5)))
row.names(tmp) <- colnames(tmp) <- LETTERS[1:5]
tmp[upper.tri(tmp)] <- NA
diag(tmp) <- NA
tmp.m <- na.omit(melt(tmp, varnames = c('row', 'col')))