I'm new to R and I need advice on dealing with this problem:
I have 2 tables. The start of the tables are shown below:
Table 1:
SNP Gene Pval Best_SNP Best_Pval
rs2932538 ENSG00000007341 5.6007
rs10488631 ENSG00000064419 7.7461
rs12537284 ENSG00000064419 4.5544
rs3764650 ENSG00000064666 12.3401
rs10479002 ENSG00000072682 5.0141
rs6704644 ENSG00000072682 6.2306
rs2900211 ENSG00000072682 9.9022
Table 2:
Best_SNP Gene Best_Pval
rs9028922 ENSG00000007341 10.7892
rs8233293 ENSG00000064666 89.342
rs3234432 ENSG00000072682 32.321
rs2892334 ENSG00000064419 43.235
Table 1 contains the entire list of SNPs for each gene. Table 2 contains the best SNP and the corresponding best Pval for each gene that appears in Table 1.
I want to do the following: match each Gene from Table 1 to Table 2 and then copy the Best_SNP and Best_Pval from Table 2 and paste them in under Best_SNP and Best_Pval columns in Table 1 for that Gene. The tricky part is that in Table 1, each gene is repeated for a random, different number of rows. For example, the second gene ENSG00000064419 repeats for 2 rows and ENSG00000072682 repeats for 3 rows. So the code needs to filter through the names of the genes, and only copy down the Best_SNP and Best_Pval once for the same gene.
So for gene ENSG00000072682, out of the 3 rows, only the first row that appears to contain the gene needs to have the Best_SNP and Best_Pval columns filled in. I don't want the rest of the 2 repeated rows to also have the columns Best_SNP and Best_Pval filled in. It'll be easier to see where each gene starts and ends that way.