0

I have two dataframes:

df1:

name  start   end       prop
NC12     0   15000     62.62667  
NC12   100   15100     62.62667  
NC14     0   15000     62.66000   
NC14   100   15100     62.62000   
NC14   200   15200     62.67333   
NC15     0   15000     62.66667   

df2:

name    SNPs    type 
NC12    1569    A 
NC12    15002   B
NC12    15007   C
NC14    15165   A
NC14    15187   D
NC15    1572    B

I want to append the value of column type from df2 to df1 if SNPs in df2 fall within the range of start and end in df1 and values in name matches in both. In cases where SNPs fall within multiple ranges, append type value separated by a delimiter.

If there are no values to be assigned to row in df1 just add null.

so the result of processing two dataframes will be:

df3:

name  start   end       prop     type
NC12     0   15000     62.62667  A
NC12   100   15100     62.62667  A,B
NC14     0   15000     62.66000  null 
NC14   100   15100     62.62000  null
NC14   200   15200     62.67333  null 
NC15     0   15000     62.66667  B

Any ideas?

msakya
  • 9,311
  • 5
  • 23
  • 31
  • @Arun has published an update to pkg:data.table that should be helpful if efficiency is needed. Otherwise you should look at the bioc-package `IRanges`. – IRTFM Dec 10 '14 at 02:32
  • Specifically look at rolling joins with data.table. See http://stackoverflow.com/questions/24480031/roll-join-with-start-end-window – MrFlick Dec 10 '14 at 03:07

0 Answers0