The OP has requested to find all distance comparisons from table2
for samples which share a section in table1
.
This can be achieved by two different approaches:
- Look up the respective section ids for
Sample1
and Sample2
each in table1
and keep only those rows of table2
where the section ids match.
- Create all unique combinations of sample ids for each section in
table1
and find the appropriate entries in table2
(if any).
Approach 1
Base R
tmp <- merge(table2, table1, by.x = "Sample1", by.y = "Sample")
tmp <- merge(tmp, table1, by.x = "Sample2", by.y = "Sample")
tmp[tmp$Section.x == tmp$Section.y, c("Sample2", "Sample1", "distance")]
Sample2 Sample1 distance
1 2 1 10
2 3 1 1
3 3 2 5
dplyr
library(dplyr)
table2 %>%
inner_join(table1, by = c(Sample1 = "Sample")) %>%
inner_join(table1, by = c(Sample2 = "Sample")) %>%
filter(Section.x == Section.y) %>%
select(-Section.x, -Section.y)
Sample1 Sample2 distance
1 1 2 10
2 1 3 1
3 2 3 5
data.table
Using nested joins
library(data.table)
tmp <- setDT(table1)[setDT(table2), on = .(Sample == Sample1)]
table1[tmp, on = .(Sample == Sample2)][
Section == i.Section, .(Sample1 = i.Sample, Sample2 = Sample, distance)]
using merge() and chained data.table expressions
tmp <- merge(setDT(table2), setDT(table1), by.x = "Sample1", by.y = "Sample")
merge(tmp, table1, by.x = "Sample2", by.y = "Sample")[
Section.x == Section.y, -c("Section.x", "Section.y")]
Sample2 Sample1 distance
1: 2 1 10
2: 3 1 1
3: 3 2 5
Approach 2
Base R
table1_cross <- do.call(rbind, lst <- lapply(
split(table1, table1$Section),
function(x) as.data.frame(combinat::combn2(x$Sample))))
merge(table2, table1_cross, by.x = c("Sample1", "Sample2"), by.y = c("V1", "V2"))
Here, the handy combn2(x)
function is used which generates all combinations of the elements of x taken two at a time, e.g.,
combinat::combn2(1:3)
[,1] [,2]
[1,] 1 2
[2,] 1 3
[3,] 2 3
The tedious part is to apply combn2()
to each group of Section
separately and to create a data.frame which can be merged, finally.
dplyr
This is a streamlined version of www's approach
full_join(table1, table1, by = "Section") %>%
filter(Sample.x < Sample.y) %>%
semi_join(x = table2, y = ., by = c(Sample1 = "Sample.x", Sample2 = "Sample.y"))
Non-equi self join
library(data.table)
setDT(table2)[setDT(table1)[table1, on = .(Section, Sample < Sample), allow = TRUE,
.(Section, Sample1 = x.Sample, Sample2 = i.Sample)],
on = .(Sample1, Sample2), nomatch = 0L]
Sample1 Sample2 distance Section
1: 1 2 10 1
2: 1 3 1 1
3: 2 3 5 1
Here, a non-equi join is used to create the unique combinations of Sample
for each Section
. This is equivalent to using combn2()
:
setDT(table1)[table1, on = .(Section, Sample < Sample), allow = TRUE,
.(Section, Sample1 = x.Sample, Sample2 = i.Sample)]
Section Sample1 Sample2
1: 1 NA 1
2: 1 1 2
3: 1 1 3
4: 1 2 3
5: 2 NA 4
6: 2 4 5
7: 3 NA 6
The NA
rows will be removed in the final join.