I have a dataframe with many genes (the column being "gene"). Some of the genes appear more than once. I want to subset the dataframe where I only have genes that appear MORE than once. In other words, I want to REMOVE the rows that are unique in respect to the "gene" column.
Asked
Active
Viewed 2,179 times
2 Answers
4
We can use subset
with table
in base R
. Get the frequency count of 'genes' with table
, create a logical expression that checks the count greater than 1, retrieve those genes and use %in%
to subset those genes
subset(df1, genes %in% names(which(table(genes) > 1)))
Or another option is duplicated
subset(df1, duplicated(genes)|duplicated(genes, fromLast = TRUE))
Or using dplyr
library(dplyr)
df1 %>%
group_by(genes) %>%
filter(n() > 1) %>%
ungroup

akrun
- 874,273
- 37
- 540
- 662
1
Here is another base R option, using subset
+ ave
subset(df, ave(gene,gene,FUN = length)>1)

ThomasIsCoding
- 96,636
- 9
- 24
- 81