I have a dataframe which I mapped it to the various genomic region which givens me peak and its respective genes. Now two peaks can be mapped to one genomic region given the distance which I end up like this
Peak annotation ENSEMBL log2FoldChange padj UP_DOWN
Peak13361 Distal Intergenic ENSG00000000457 3.458416 1.429138e-03 UP
Peak13362 Distal Intergenic ENSG00000000457 2.208152 3.153138e-10 UP
Peak13356 Distal Intergenic ENSG00000000457 -2.092536 1.693891e-03 DOWN
Peak13329 Distal Intergenic ENSG00000000460 3.862953 2.713778e-05 UP
Peak13331 Distal Intergenic ENSG00000000460 2.535419 3.064567e-02 UP
Peak2767 Promoter ENSG00000000938 2.664457 2.362797e-03 UP
Peak2769 Distal Intergenic ENSG00000000938 1.588538 3.678620e-07 UP
Peak2771 Distal Intergenic ENSG00000000938 1.818130 5.232734e-03 UP
Peak2772 Distal Intergenic ENSG00000000938 1.800501 2.102107e-02 UP
Peak15396 Distal Intergenic ENSG00000000971 1.577753 1.045814e-02 UP
For example from this first three peak
Peak annotation ENSEMBL log2FoldChange padj UP_DOWN
Peak13361 Distal Intergenic ENSG00000000457 3.458416 1.429138e-03 UP
Peak13362 Distal Intergenic ENSG00000000457 2.208152 3.153138e-10 UP
Peak13356 Distal Intergenic ENSG00000000457 -2.092536 1.693891e-03 DOWN
I would like to choose this peak only which has the most significance
Peak13362 Distal Intergenic ENSG00000000457 2.208152 3.153138e-10 UP
This is the logic i have to follow if one peak has multiple ENSEMBL ID I have to look for the one which has the hugest significance
Any suggestion or help would be really appreciated