0

I have a large List of Genes, i want to look up the homologes for.

I also have a large dataframe with potential homologes. The tenth column of this Dataframe inherits a number, describing the fitting. The larger the number, the better.

I am trying to loop over this large List of Genes.

For each unique Gene in the List, i want to select the best fitting homologe gene.

The output should be a dataframe with one line per Gene, describing the best fitting homologe.

1 Answers1

0

Tidyverse solution, assuming you have a column with an gene_id per gene and the fitting score is in a column called score:

library(tidyverse)

df %>% group_by(gene_id) %>% filter(score == max(score)) %>% ungroup()
Peter
  • 215
  • 2
  • 8
  • This seems like a tidy solution =,D. But whenever i try to load the package into library after installing it, the is the following error. Error in library(tidyverse) : there is no package called ‘tidyverse’ – Jonas Engelhardt Jun 13 '19 at 19:28
  • @JonasEngelhardt you actually only need the `dplyr` package for that and not the whole tidyverse. – Peter Jun 14 '19 at 07:06