Eliminate specific rows in a dataset

Question

I have a data frame which is in .csv format. This data frame includes 34500 rows. In this file, list of a RNAseq analysis result is present. Here the problem is some genes have multiple results and I should pick 1 entry for each gene and this entry should have the most p value. I edited my data and I have just "Gene symbol" and "p value" information.

How can i remove/eliminate rows which includes genes that should be eliminated according to my rule. I will add a screenshot which shows my problem.

Thanks in advance.

Please add your data with `dput`. Use `dput(head(df,n))` not **images**. Also include sample code and what your rule is. — NelsonGon, Aug 05 '19 at 12:44
I could not write any code,so i did not add. My rule is to eliminate rows, which belong to the genes that have multiple entries, the row with the most p value should remain and the other entries should be eliminated. — Melih O., Aug 05 '19 at 12:50
OK, add your comment to your post and add data as suggested above or make a dummy data set. Include a sample of the expected output too. — NelsonGon, Aug 05 '19 at 12:51
Related, possible duplicate https://stackoverflow.com/q/24070714/680068 — zx8754, Aug 05 '19 at 13:18

akrun · Accepted Answer · 2019-08-05T13:24:48.440

1

Assuming that the blanks ("") correspond to repeat entries from the previous non-blank "Gene", change the blanks to NA (na_if), then use fill to change the NA to previous non-NA value, grouped by 'Gene', get the row with the max value for 'pvalue'

library(dplyr)
library(tidyr)
df1 %>%
    mutate(Gene = na_if(Gene, "")) %>%
    fill(Gene) %>%
    group_by(Gene) %>%
    slice(which.max(pvalue))

edited Aug 05 '19 at 13:24

answered Aug 05 '19 at 13:02

akrun

874,273
37
540
662

Thanks, the solution looks what i look for but when i tried to write R could not find fill function. I install dplyr package succesfully but it does not work. – Melih O. Aug 05 '19 at 13:24
@MelihO. You an just assign to a object `df1 <- df1 %>% mutate(...` or to a new one `df2 <- df1 %>% mutate(..` – akrun Aug 05 '19 at 13:31

Eliminate specific rows in a dataset

1 Answers1