0

I have two tables; table1 and table2, where table1 is much bigger than table2, but table2 is not fully contained in table1. I also have two ID columns - ID1 and ID2 in each table. I want to obtain the rows in table1 and table 2 in which the two ID columns coincide. If a pairing of ID's is in one table and not the other then the row should not be returned.

I tried t1[which(t1$ID1 == t2$ID1 & t1$ID2 == t2$ID2

It said that the longer object length is not a multiple of shorter object length. Any ideas?

thelatemail
  • 91,185
  • 12
  • 128
  • 188
Governor
  • 300
  • 1
  • 10
  • 4
    This is an inner join I think - https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – thelatemail Feb 24 '23 at 01:07

2 Answers2

3

With dplyr::semi_join() (and borrowing @thesixmax’s example data):

library(dplyr)

table1 %>%
  semi_join(table2, by = c("ID_1", "ID_2"))
#   ID_1 ID_2 val
# 1  0_2  1_2   2
# 2  0_4  1_4   4

table2 %>%
  semi_join(table1, by = c("ID_1", "ID_2"))
#   ID_1 ID_2 val
# 1  0_2  1_2   1
# 2  0_4  1_4   2
zephryl
  • 14,633
  • 3
  • 11
  • 30
2

Simple repex:

table1 <- data.frame(
  "ID_1" = c("0_1", "0_2", "0_3", "0_4", "0_5"),
  "ID_2" = c("1_1", "1_2", "1_3", "1_4", "1_5"),
  val = c(1, 2, 3, 4, 5)
)

table2 <- data.frame(
  "ID_1" = c("0_2", "0_4", "0_6", "0_7", "0_8", "0_9", "0_10"),
  "ID_2" = c("1_2", "1_4", "1_6", "1_7", "1_8", "1_9", "1_10"),
  val = c(1, 2, 3, 4, 5, 6, 7)
)

A solution using base R:

ids1 <- which(interaction(table1[,c("ID_1", "ID_2")]) %in% 
               interaction(table2[,c("ID_1", "ID_2")]))
ids2 <- which(interaction(table2[,c("ID_1", "ID_2")]) %in%
                interaction(table1[,c("ID_1", "ID_2")]))
overlap1 <- table1[ids1,]
overlap2 <- table2[ids2,]
thesixmax
  • 164
  • 1
  • 9
  • 2
    You can omit the `which()` and this would still work (just using logical indexing instead of positions). – zephryl Feb 24 '23 at 01:56