I have a dataframe df1:
df1 <- read.table(text=" Chr06 79641
Chr06 82862
Chr06 387314
Chr06 656098
Chr06 678491
Chr06 1018696", header=FALSE, stringsAsFactors=FALSE)
I would like to check if each row in df1 is contained in a range in df2. the column2 in df2 is the start of a range, and column3 is the end of a range. no overlapping between ranges (between rows). The data in df2 are sorted by Column1 and column2. I wrote a loop for this but I am not happy to it because It runs so long time if I have a few thousands rows in df1. I would like to find a more efficient way to do this job (better no looping). Thanks. The df2 data frame:
df2 <- read.table(text=" Chr05 799 870
Chr06 77914 77942
Chr06 78233 78269
Chr06 78719 78836
Chr06 79720 87043
Chr06 87223 87305
Chr06 380020 380060
Chr06 387314 387371
Chr06 654907 654988
Chr06 657929 658057
Chr06 677198 677229
Chr06 679555 680170
Chr06 1015425 1015475
Chr06 1018676 1018736
Chr06 1020564 1020592", header=FALSE, stringsAsFactors=FALSE)
My script:
df1$V3 <- FALSE
for (i in 1:dim(df1)[1]) {
for (j in 1:dim(df2)[1]) {
if ((df1[i,1] == df2[j,1]) && (df1[i,2]>=df2[j,2])
&& (df1[i,2]<=df2[j,3])) {
df1[i,3]<-TRUE
break;
}
}
}
df1
The expected result is shown as df1.