this is my first time here, so my apologies if I made something wrong or confounding. I am working with genomic data, and I have two data frames: One of them is information about the Ancestry of a range of SNPs (see table below):
Chrom | Start | End | Ancestry
---------------------------------------
22 | 16495833 | 19868218 | EUR_Patag
22 | 19873357 | 21405110 | Patag_Patag
22 | 21416404 | 21449724 | Patag_UNK
22 | 21458082 | 23704421 | EUR_Patag
22 | 23712647 | 23717466 | Patag_UNK
The other data frame contain information about the phased genotype for each rsID (see table below):
Chrom | Pos | ID | Genot
---------------------------------------
22 | 16495833 | rs116823 | 0|1
22 | 16620701 | rs635455 | 0|0
22 | 16648658 | rs445724 | 1|1
22 | 16872459 | rs827345 | 1|0
22 | 16880098 | rs309287 | 1|1
So, I want to analyse each SNPs from the second data frame (through the "Pos" column) with the first data frame, an evaluate the range which this specific SNPs is located and assign with a new column (in the second data frame) the Ancestry (according the range in which the SNP is located).
I was searching for a solution, and I found that the library Data Table in R is able to attend this issue, but unfortunately I was not able to find a solution.
I hope a well understanding for my question. Thank you very much for your help