0

I imagine that there's some way to do this with sqldf, though I'm not familiar with the syntax of that package enough to get this to work. Here's the issue:

I have two data frames, each of which describe genomic regions and contain some other data. I have to combine the two if the region described in the one df falls within the region of the other df.

One df, g, looks like this (though my real data has other columns)

      start_position end_position
    1       22926178     22928035
    2       22887317     22889471
    3       22876403     22884442
    4       22862447     22866319
    5       22822490     22827551

And another, l, looks like this (this sample has a named column)

                 name    start      end
101     GRMZM2G001024 11149187 11511198
589     GRMZM2G575546 24382534 24860958
7859    GRMZM2G441511 22762447 23762447
658  AC184765.4_FG005 26282236 26682919
14      GRMZM2G396835 10009264 10402790

I need to merge the two dataframes if the values from the start_position OR end_position columns in g fall within the start-end range in l, returning only the columns in l that have a match. I've been trying to get findInterval() to do the job, but haven't been able to return a merged DF. Any ideas?

My data:

g <- structure(list(start_position = c(22926178L, 22887317L, 22876403L, 
22862447L, 22822490L), end_position = c(22928035L, 22889471L, 
22884442L, 22866319L, 22827551L)), .Names = c("start_position", 
"end_position"), row.names = c(NA, 5L), class = "data.frame")

l <- structure(list(name = structure(c(2L, 12L, 9L, 1L, 8L), .Label = c("AC184765.4_FG005", 
"GRMZM2G001024", "GRMZM2G058655", "GRMZM2G072028", "GRMZM2G157132", 
"GRMZM2G160834", "GRMZM2G166507", "GRMZM2G396835", "GRMZM2G441511", 
"GRMZM2G442645", "GRMZM2G572807", "GRMZM2G575546", "GRMZM2G702094"
), class = "factor"), start = c(11149187L, 24382534L, 22762447L, 
26282236L, 10009264L), end = c(11511198L, 24860958L, 23762447L, 
26682919L, 10402790L)), .Names = c("name", "start", "end"), row.names = c(101L, 
589L, 7859L, 658L, 14L), class = "data.frame")
MHtaylor
  • 29
  • 6
  • This sounds similar: http://stackoverflow.com/questions/11892241/merge-by-range-in-r-applying-loops/11893440#11893440 – flodel Apr 07 '14 at 00:53
  • I could be mistaken, but the code I wrote is finding that no matches occur in your example. I may be misunderstanding the question though. – Rich Scriven Apr 07 '14 at 01:04
  • I just pulled a random sample - there may not be any matches there. – MHtaylor Apr 07 '14 at 01:21
  • 1
    May not? Or ARE not? It's difficult to post a good answer without knowing if it works correctly. – Rich Scriven Apr 07 '14 at 01:25
  • 1
    Please provide the corresponding output. If the output has no rows for the inputs shown please improve the example in the question. – G. Grothendieck Apr 07 '14 at 02:16
  • There should be at least one match now, my bad. I think GRanges should work - similar to what is seen in flodels link, though I'll have to incorporate a few extra annotations when creating the object. – MHtaylor Apr 07 '14 at 13:10

0 Answers0