Merging two datasets in a one-to-many relationship

Question

Table 1 (900 rows) identifies the regulators of each cluster:

reg cluster weight
a   0   59
b   0   52
e   1   46
f   1   46
g   2   66
h   2   18

Table 2 (90 rows) identifies the genes belonging to each cluster:

gene cluster 
1   0
2   1
3   0
4   2
5   0
6   0
7   1

I want to create a table 3 from these two that looks like this:

reg gene weight
a   1   59
a   3   59
a   5   59
a   6   59
b   1   52
b   3   52
b   5   52
b   6   52
e   2   46
e   7   46
…

i.e., I want a list of every interaction between regs and genes. How can I do this in R?

Welcome to SO! It pays off to look at previous questions in order to get an idea on how to do something. The answer provided is explained in more detail in the question I linked to. If in doubt about how to use this site, please take a look at http://stackoverflow.com/help — Joris Meys, Aug 26 '14 at 13:06

score 2 · Answer 1 · answered Aug 26 '14 at 13:03

merge is made for this type of stuff:

merge(reg, gene)

Produces

   cluster reg weight gene
1        0   a     59    1
2        0   a     59    3
3        0   a     59    5
4        0   a     59    6
5        0   b     52    1
6        0   b     52    3
7        0   b     52    5
8        0   b     52    6
9        1   e     46    2
10       1   e     46    7
11       1   f     46    2
12       1   f     46    7
13       2   g     66    4
14       2   h     18    4

Merge automatically joins on the common column names (cluster in this case), though you can change that if you want. You can also easily drop the cluster column if you don't want it with merge(reg, gene)[-1].

data.table is a faster option if you have large data sets:

library(data.table)
data.table(reg, key="cluster")[
  data.table(gene, key="cluster"),
  allow.cartesian=TRUE
]

Joins here work by specifying the key columns to join on.

Merging two datasets in a one-to-many relationship

1 Answers1