I have a genetic dataset of IDs (dataset1) and a dataset of IDs which interact with each other (dataset2). I am trying to count IDs in dataset1 which appear in either of 2 interaction columns in dataset2 and also record which are the interacting/matching IDs in a 3rd column.
Dataset1:
ID
1
2
3
Dataset2:
Interactor1 Interactor2
1 5
2 3
1 10
Output:
ID InteractionCount Interactors
1 2 5, 10
2 1 3
3 1 2
So the output contains all IDs of dataset1 and a count of those IDs also appear in either column 1 or 2 of dataset2, and if it did appear it also stores which ID numbers in dataset2 it interacts with.
I have a biology background, so have guessed at approaching this, so far I've managed to use merge()
and setDT(mergeddata)[, .N, by=ID]
to try to count the dataset1 IDs which appear in dataset2, but I'm not sure if this is the right approach to be able to add in the creation of the column storing the interacting IDs. Any help on possible functions which can store matched IDs in a 3rd column would be appreciated.
Input data:
dput(dataset1)
structure(list(ID = 1:3), row.names = c(NA, -3L), class = c("data.table",
"data.frame"))
dput(dataset2)
structure(list(Interactor1 = c(1L, 2L, 1L), Interactor2 = c(5L,
3L, 10L)), row.names = c(NA, -3L), class = c("data.table", "data.frame"
))