Subset all rows in a dataframe that are NOT unique (based on a vector/column) .. or remove unique rows

Question

I have a dataframe with many genes (the column being "gene"). Some of the genes appear more than once. I want to subset the dataframe where I only have genes that appear MORE than once. In other words, I want to REMOVE the rows that are unique in respect to the "gene" column.

score 4 · Answer 1 · answered May 04 '21 at 20:50

We can use subset with table in base R. Get the frequency count of 'genes' with table, create a logical expression that checks the count greater than 1, retrieve those genes and use %in% to subset those genes

subset(df1, genes %in% names(which(table(genes) > 1)))

Or another option is duplicated

subset(df1, duplicated(genes)|duplicated(genes, fromLast = TRUE))

Or using dplyr

library(dplyr)
df1 %>%
   group_by(genes) %>%
   filter(n() > 1) %>%
   ungroup

score 1 · Answer 2 · answered May 05 '21 at 09:33

1

Here is another base R option, using subset + ave

subset(df, ave(gene,gene,FUN = length)>1)

answered May 05 '21 at 09:33

ThomasIsCoding

96,636
9
24
81

Subset all rows in a dataframe that are NOT unique (based on a vector/column) .. or remove unique rows

2 Answers2