I have 2 genetic datasets where I am trying to find if a variant at a certain position in the genome (file1) is matching/found within the ranges any of my rows in another dataset (file2), then extract the matches found file 2 to merge with file 1. The one condition is that the matches are only searched for variants if they have the same chromosome. For example:
File1:
Chromosome Position
1 3
1 47
2 10
3 2
File2:
Chromosome Start End
1 101 102
1 40 50
2 40 50
3 20 22
Expected output:
Chromosome Start End
1 40 50
#this is the only row from which a variant from file1 fits in its position range and is on the same chromosome
Ideally, I would merge in the file1 variant to align with it's matched chromosome start and end position in file2 all in the same row, but I am new to R and stuck on the first step of trying to match the variant based on if it's position number is within the range of the second file. Currently I am trying to adapt:
dt1[ dt2, match := i.,ID #including a made-up ID column for the sake of trying to adapt this code
on = .(Chromosome, Position > Start, Position < End ) ]
however this doesn't seem work, and beyond this I don't know how else to start. Any help on how to approach this would be appreciated
Data:
dput(file1)
structure(list(Chromosome = c(1L, 1L, 2L, 3L), Position = c(3L,
47L, 10L, 2L)), row.names = c(NA, -4L), class = c("data.table",
"data.frame"))
dput(file2)
structure(list(Chromosome = c(1L, 1L, 2L, 3L), Start = c(101L,
40L, 40L, 20L), End = c(102L, 50L, 50L, 22L)), row.names = c(NA,
-4L), class = c("data.table", "data.frame"))