Obtain the specific range that overlap

Question

I have two dataframes: cnv_1

chr     start   end
3   62860387    63000898
12  31296219    31406907
14  39762575    39769146
19  43372386    43519442
19  56419263    56572829

cnv_2

chr     start   end
6   30994163    30995078
19  43403531    44608011
18  1731154 1833682
3   46985863    47164711

with aprox 150000 entries each. I would like to know which fragments of cnv_1overlap in any way with cnv_2, and -this is the most important for me- to obtain the specific region that overlap. For example, doing that to the data.frames of the example, to obtain:

chr     start   end
19  43403531 43519442

thank you very much

have a look here: https://stackoverflow.com/questions/3916195/finding-overlap-in-ranges-with-r — user1981275, Dec 01 '16 at 12:57
thanks @user1981275, but I also need to know the exact range of overlap. I don`t know if it's possible with IRanges — Julio Rodríguez, Dec 05 '16 at 14:43

Joe · Answer 1 · 2016-12-03T23:07:45.893

0

Here's a dplyr chain that joins common regions between the two data frames, looks for an overlap and gets the start and end values.

library(dplyr)
inner_join(cnv_1, cnv_2, by="chr") %>% 
  filter(!(start.x > end.y | start.y > end.x)) %>%
  transmute(chr, start.o = ifelse(start.y > start.x, start.y, start.x),
                   end.o = ifelse(end.y > end.x, end.x, end.y))

Output is:

  chr  start.o    end.o
1  19 43403531 43519442

This works symmetrically for the two data frames. If you only want a one-way overlap, you can simplify the filter and transmute expressions as needed.

edited Dec 03 '16 at 23:07

answered Dec 01 '16 at 14:24

Joe

8,073
1
52
58

thanks @Joe, but I would like to obtain the range that the two regions have in common; the "start" and the "end" that the two ranges have in common. Thanks a lot – Julio Rodríguez Dec 01 '16 at 15:23
Perfect @Joe; and it's posible to know the cnv_1 ranges (rows) that do not overlap with cnv_2 ranges, I mean, that in the output I obtain the regions that overlaps and regions with NA. Thanks. – Julio Rodríguez Dec 02 '16 at 09:33
Glad that works! Would you mind accepting? (the tick below the vote number.) – Joe Dec 02 '16 at 09:49
sorry Joe, but in which package is inner_join? I get this message "Error in eval(expr, envir, enclos) : could not find function "inner_join"" – Julio Rodríguez Dec 02 '16 at 11:00
You have to install and load `dplyr`. – Joe Dec 02 '16 at 12:10
No, it does't work. `Warning message: In inner_join_impl(x, y, by$x, by$y, suffix$x, suffix$y) : joining factors with different levels, coercing to character vector` – Julio Rodríguez Dec 05 '16 at 14:34

joel.wilson · Accepted Answer · 2016-12-02T10:16:51.450

0

based on the link shared :

cnv_3 <- merge(cnv_1, cnv_2, by = "chr", suffixes = letters[1:2])
# below function has 3 conditions : 1 fully inside the interval and 2 partial overlap cases
func <- function(x){
  if(x["starta"]>x["startb"] & x["enda"]<x["endb"])
    x
  else if( x["starta"]<x["startb"] & x["enda"] < x["endb"]){
    x["starta"]=x["startb"]
    x
  } else if( x["starta"] >x["startb"]&x["starta"]<x["endb"]&x["enda"]>x["endb"]){
    x["enda"]=x["endb"]
    x
  }
  else
    c(x[1] ,rep(NA, length(x)-1))
}


df <-  data.frame(t(apply(cnv_3, 1, func)))
df <- df[!is.na(df[,1]),][1:3]
colnames(df) <- colnames(cnv_1)
# incase you want all the original cnv_1 rows with NA's for non-overlapping
xxx <- cnv_1[!(cnv_1$chr %in% df$chr),]
xxx$start <- xxx$end <- NA
rbind(xxx, df)
#   chr    start      end
#2   12       NA       NA
#3   14       NA       NA
#31   3       NA       NA
#4   19 43403531 43519442
#5   19       NA       NA

edited Dec 02 '16 at 10:16

answered Dec 01 '16 at 15:24

joel.wilson

8,243
5
28
48

Hi @joel.wilson, could be possible to obtain also the rows that not overlap? indicating NA for example... Thanks a lot. – Julio Rodríguez Dec 02 '16 at 09:53
NA's on all columns? – joel.wilson Dec 02 '16 at 09:55
I'm sorry but it doesn't work, the specific region that overlap. – Julio Rodríguez Dec 05 '16 at 14:45
on the real data? – joel.wilson Dec 05 '16 at 14:51

Obtain the specific range that overlap

2 Answers2