How can I select rows from a data frame with the highest variable in a column using R?

Question

If I have the data frame provided below is there a way to select for the highest IDs for all the genes.

gene_name <- c("AADACL2", "AADACL3", "AADACL4", "AADACL4", "AADACL4", "AADACL4", "AADACL4", "AADACL4")

target_id <- c(79.0524, 62.0098, 61.6708, 65.1106, 58.6207, 63.9706, 64.3735, 61.3232)

table <- data.frame(gene_name = gene_name, id = target_id)

I want a dataframe that looks something like this instead:

gene_name_2 <- c("AADACL2", "AADACL3", "AADACL4")

target_id_2 <- c(79.0524, 62.0098, , 65.1106) 

table_2 <- data.frame(gene_name = gene_name_2, id = target_id_2)

I have a much bigger set of data than this so need to do it for a lot of genes, I just can't work out a way to do it

Onyambu · Answer 1 · 2018-02-27T17:48:14.973

0

aggregate(.~gene_name,table,max)
  gene_name      id
1   AADACL2 79.0524
2   AADACL3 62.0098
3   AADACL4 65.1106


library(tidyverse)

table%>%group_by(gene_name)%>%arrange(desc(id))%>%top_n(1,id)
# A tibble: 3 x 2
# Groups:   gene_name [3]
  gene_name      id
     <fctr>   <dbl>
1   AADACL2 79.0524
2   AADACL4 65.1106
3   AADACL3 62.0098

edited Feb 27 '18 at 17:48

answered Feb 27 '18 at 17:40

Onyambu

67,392
3
24
53

How can I select rows from a data frame with the highest variable in a column using R?

1 Answers1