1

I have a dataframe with many genes (the column being "gene"). Some of the genes appear more than once. I want to subset the dataframe where I only have genes that appear MORE than once. In other words, I want to REMOVE the rows that are unique in respect to the "gene" column.

Phil
  • 7,287
  • 3
  • 36
  • 66

2 Answers2

4

We can use subset with table in base R. Get the frequency count of 'genes' with table, create a logical expression that checks the count greater than 1, retrieve those genes and use %in% to subset those genes

subset(df1, genes %in% names(which(table(genes) > 1)))

Or another option is duplicated

subset(df1, duplicated(genes)|duplicated(genes, fromLast = TRUE))

Or using dplyr

library(dplyr)
df1 %>%
   group_by(genes) %>%
   filter(n() > 1) %>%
   ungroup
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Here is another base R option, using subset + ave

subset(df, ave(gene,gene,FUN = length)>1)
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81