1

I have two data.table object:

dt <- data.table(
  a = c(1, 1, 2), 
  s = c(1, 2, 5), 
  o = c("A", "A", "A")
)

dt2 <- data.table(
  a = c(1, 1, 2, 2),
  g = c(1, 2, 3, 4)  
)  

There are two elements of g for each a in dt2. I want to match a in dt with the a in dt2. Then in dt, each row will duplicate 2 times (because there are 2 element for each a in dt2. Certainly, it may be different number of elements for each a in dt2. That means the condition in dt2 determine how I append the corresponding g of dt2 into dt), and append the corresponding g of dt2 at the last column of dt. The final table should be like this:

fnl <- data.table(
  a = c(1, 1, 1, 1, 2, 2), 
  s = c(1, 1, 2, 2, 5, 5), 
  o = c("A", "A", "A", "A", "A", "A"),
  g = c(1, 2, 1, 2, 3, 4)
)

For clear, here is the two input tables and one output table.

> dt
   a s o
1: 1 1 A
2: 1 2 A
3: 2 5 A
> dt2
   a g
1: 1 1
2: 1 2
3: 2 3
4: 2 4
> fnl
   a s o g
1: 1 1 A 1
2: 1 1 A 2
3: 1 2 A 1
4: 1 2 A 2
5: 2 5 A 3
6: 2 5 A 4

I know that using for-loop I can do the task, but my data is too big, can you help figure out what is the effective way to do this?

1 Answers1

2

Since you are using data.tables it's pretty easy:

dt[dt2, on='a']

gives you

> dt[dt2, on='a']
   a s o g
1: 1 1 A 1
2: 1 2 A 1
3: 1 1 A 2
4: 1 2 A 2
5: 2 5 A 3
6: 2 5 A 4
Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • Thank you. First time to know this powerful function. – BioChemoinformatics May 06 '20 at 14:44
  • 1
    Look here https://rstudio-pubs-static.s3.amazonaws.com/52230_5ae0d25125b544caab32f75f0360e775.html and there https://stackoverflow.com/questions/12773822/why-does-xy-join-of-data-tables-not-allow-a-full-outer-join-or-a-left-join for more information. This JOINing function of data.tables is really awesome. – Martin Gal May 06 '20 at 17:36