I have a data.frame, the start of it is below:
gene snp pval best_snp best_pval
1 ENSG00000007341 rs2932538 5.6007 rs17030613 10.0542
2 ENSG00000064419 rs10488631 7.7461 rs4728142 24.6101
3 ENSG00000064419 rs12531711 7.7449 rs4728142 24.6101
4 ENSG00000064419 rs12537284 4.5544 rs4728142 24.6101
5 ENSG00000064666 rs3764650 12.3401 rs3752246 5.4001
6 ENSG00000072682 rs10479002 5.0141 rs12521868 21.1550
As shown, in lines 2-4 the same gene is repeated. For genes that are repeated, I only want to keep the best_snp
and best_pval
values for the first row that the gene first appears, so row 2; and for row 3&4 I want to delete the best_snp
and best_pval
values, since it's the same as above.
If a gene is not repeated, then just leave it as it is.
Please keep in mind that the table is much larger than shown and the genes are repeated at random places.