How to remove identical rows in a large data.frame

Question

I have a data.frame that looks like this:

GN  PN  
a   3.4   
a   3.4   
a   9.8   
d   8.4   
e   9 
e   6.5

I would like the following output:

GN  PN  
a   3.4   
a   9.8   
d   8.4   
e   9 
e   6.5

(the identical rows will be removed!)

I' m trying to use the code posted in: multiple columns comparison but without success because the replicated line (a 3.4) still remains. I have a large data.frame (about 66.000 rows and 10 columns).

The real case:

 GN     SP                PN
A1CF   52573692   TCGA-B6-A0RS-01A-11D-A099
A1CF   52595854   TCGA-BH-A0HP-01A-12D-A099 
A1CF   52595854   TCGA-BH-A0HP-01A-12D-A099
A1CF   52595937   TCGA-BH-A18P-01A-11D-A12B
A2BP1  7568361    TCGA-D8-A1JN-01A-11D-A13L
A2BP1  7102099    TCGA-E2-A1BC-01A-11D-A14G
A2BP1  7102099    TCGA-E2-A1BC-01A-11D-A14G
A2BP1  7383011    TCGA-AR-A1AJ-01A-21D-A12Q
A2BP1  7383011    TCGA-AR-A1AJ-01A-21D-A12Q
A2BP1  7568188    TCGA-BH-A18J-01A-11D-A12B
A2BP1  7629860    TCGA-AO-A03O-01A-11W-A019
A2BP1  7629860    TCGA-AO-A03O-01A-11W-A019

Answered here: http://stackoverflow.com/questions/9944816/unique-on-a-dataframe-with-only-selected-columns — harkmug, Mar 22 '13 at 15:49

score 1 · Accepted Answer · answered Mar 22 '13 at 15:18

1

Just use :

 unique(df)

Which gives :

  GN  PN
1  a 3.4
3  a 9.8
4  d 8.4
5  e 9.0
6  e 6.5

answered Mar 22 '13 at 15:18

juba

47,631
14
113
118

Unfortunately it does not work.. – Fuv8 Mar 22 '13 at 15:21
Could you please elaborate a bit more ? It works well on your example here... – juba Mar 22 '13 at 15:23

score 1 · Answer 2 · answered Mar 22 '13 at 15:23

1

Maybe you can try new.df=subset(df,!duplicate(df))

answered Mar 22 '13 at 15:23

Duck

39,058
13
42
84

just posted the real case! – Fuv8 Mar 22 '13 at 15:33

How to remove identical rows in a large data.frame

2 Answers2