I have two dataframes: one with a list of SNPs and their positions, and another with a list of genes and their start and end coordinates. Using dplyr, I'd like to add a column to the SNP dataframe that has the name of the gene that each SNP falls within (i.e. the position of the SNP is on the same chromosome, and falls between the start/end coordinates, inclusive, of the gene).
If a SNP doesn't fall within any gene coordinates, it should get "NA" in the gene column. The chromosome number between SNP and gene must match. For example, even though the position of the second SNP falls within the start/end coordinates of Gene4, that is not a match because they are on different chromosomes.
SNP dataframe:
CHR POS REF ALT
01 5 C T
01 10 G A
02 5 G T
02 15 C A
02 20 T C
03 10 A G
03 20 C T
GENE dataframe:
CHR START END GENE_NAME
01 2 8 Gene1
01 12 20 Gene2
01 25 30 Gene3
02 10 18 Gene4
02 25 35 Gene5
03 5 15 Gene6
Desired Output:
CHR POS REF ALT GENE_NAME
01 5 C T Gene1
01 10 G A NA
02 5 G T NA
02 15 C A Gene4
02 20 T C NA
03 10 A G Gene6
03 20 C T NA
Again, I'd like to accomplish this using dplyr. Thanks in advance for any help!