duplicates rows in R

Question

I am trying to subset a big dataset that I am working with in R. I normally use the unique command to get the dataset without the duplicates, according to the column A. Here I would like to do something a little different. I am looking forward to remove even the original row if it is a duplicate in the column A. here's a sample of what the data looks like:

Name   A    B    C    D    E
JHA    2    45   2    32   20
OMI    2    49   5    321   5  
FIG    3    17   5    14   10
GJI    4    35   6    25   22
IJF    5    25   7    36   32
OPI    4    10   8    66   25

and I would like to make it look like this

Name  A    B    C    D    E
FIG   3    17   5    14   10
IJF   5    25   7    36   32

Is there a command that can do this in 1 go?

Many thanks,

Related: [*How can I remove all duplicates so that NONE are left in a data frame?*](https://stackoverflow.com/q/13763216/2204410) — Jaap, Feb 11 '18 at 08:21

score 4 · Accepted Answer · answered Jul 07 '13 at 18:57

4

You can use duplicated like this:

dat[!(duplicated(dat$A)| 
      duplicated(dat$A,fromLast=TRUE)),]
  A  B C  D  E
3 3 17 5 14 10
5 5 25 7 36 32

answered Jul 07 '13 at 18:57

agstudy

119,832
17
199
261

Arun · Answer 2 · 2013-07-07T19:18:46.017

3

Another way:

df[!df$A %in% df$A[duplicated(df$A)], ]
  Name A  B C  D  E
3  FIG 3 17 5 14 10
5  IJF 5 25 7 36 32

(or)

df[!with(df, A %in% A[duplicated(A)]), ]

(or)

df[with(df, setdiff(A, A[duplicated(A)])), ]

If you're interested in a data.table solution then, you could do:

require(data.table)
dt <- data.table(df)
dt[dt[, .I[.N == 1], by = A]$V1]

edited Jul 07 '13 at 19:18

answered Jul 07 '13 at 19:00

Arun

116,683
26
284
387

duplicates rows in R

2 Answers2