I have two csv files. One of them contains two breakpoint positions per row, along with their corresponding chromosome numbers as well as the sample that those breakpoints are from. The other file contains a start and end position as well as a sample name and chromosome number.
Some breakpoint positions fall within the start and end positions of the other file. I want to see if there are any breakpoint positions that do not fall within any of those start and end positions. The chromosome numbers and sample names must match.
I want to compare each of these positions (pos1 and pos2)
Example of file with breakpoint positions
sample chr1 pos1 chr2 pos2
1 A01-28 1 59679925 1 204187341
2 A01-28 1 17727050 21 39859974
3 A01-28 1 40443937 2 179382940
...
5720 Z05-65 14 74930698 14 77657362
4999 Z05-65 8 54849551 11 87898249
5000 Z05-65 14 74928588 14 76065367
to see if any do NOT fall between any of these start and end values
Example of file with start and end positions
sample chr start end
1 A01-28 1 3218610 6198652
2 A01-28 1 6198745 8625449
3 A01-28 1 8630794 9666687
...
19491 Z05-65 X 142569607 151391630
19492 Z05-65 X 151393577 151394249
19493 Z05-65 X 151394464 154905589
and the chromosome numbers and sample names have to match.
I've read each file into data frames. I'm not sure how to go about doing this. I'm thinking a for loop could take forever since one file has 5000+ entries and the other has 19000+ entries. I'm not very proficient in R and I know there's probably some kind of clever way of doing this.