I have a very large data frame that looks something like this:
Gene Sample1 Sample2
A 1 0
A 0 1
A 1 1
B 1 1
C 0 1
C 0 0
I want to only keep rows where there is a duplicate in the Gene column.
So the table would become:
Gene Sample1 Sample2
A 1 0
A 0 1
A 1 1
C 0 1
C 0 0
I've tried using subset(df, duplicated(df$Genes))
in R But I think it left over some non- duplicates as the naming is more involved than A/B/C. Like: WASH11, KANSL-1, etc.
Can this be done in R or Linux shell?