0

I have a long column (9500 rows in excel), where I have a lot of gene ids. I want to remove the duplicates.

ID
BXDC2                     
BXDC5                   
BXDC5                     
BZRPL1                    
BZRPL1                                       
C10orf11         
C10orf116                                 
C10orf119              
C10orf120    
C10orf125          
C10orf125       

And I want the result to be:

ID
BXDC2                     
BXDC5                                      
BZRPL1                    
C10orf11         
C10orf116                                 
C10orf119              
C10orf120    
C10orf125         

Can anybody help me with an R script :-)?

akrun
  • 874,273
  • 37
  • 540
  • 662

1 Answers1

4

You can use duplicated or unique. Here, I am assuming that the column name is 'ID'

 df1[!duplicated(df1$ID),,drop=FALSE]

Or

  library(data.table)#v1.9.4+
  unique(setDT(df1), by='ID')
akrun
  • 874,273
  • 37
  • 540
  • 662