2

I am trying to subset a big dataset that I am working with in R. I normally use the unique command to get the dataset without the duplicates, according to the column A. Here I would like to do something a little different. I am looking forward to remove even the original row if it is a duplicate in the column A. here's a sample of what the data looks like:

Name   A    B    C    D    E
JHA    2    45   2    32   20
OMI    2    49   5    321   5  
FIG    3    17   5    14   10
GJI    4    35   6    25   22
IJF    5    25   7    36   32
OPI    4    10   8    66   25

and I would like to make it look like this

Name  A    B    C    D    E
FIG   3    17   5    14   10
IJF   5    25   7    36   32

Is there a command that can do this in 1 go?

Many thanks,

Error404
  • 6,959
  • 16
  • 45
  • 58
  • Related: [*How can I remove all duplicates so that NONE are left in a data frame?*](https://stackoverflow.com/q/13763216/2204410) – Jaap Feb 11 '18 at 08:21

2 Answers2

4

You can use duplicated like this:

dat[!(duplicated(dat$A)| 
      duplicated(dat$A,fromLast=TRUE)),]
  A  B C  D  E
3 3 17 5 14 10
5 5 25 7 36 32
agstudy
  • 119,832
  • 17
  • 199
  • 261
3

Another way:

df[!df$A %in% df$A[duplicated(df$A)], ]
  Name A  B C  D  E
3  FIG 3 17 5 14 10
5  IJF 5 25 7 36 32

(or)

df[!with(df, A %in% A[duplicated(A)]), ]

(or)

df[with(df, setdiff(A, A[duplicated(A)])), ]

If you're interested in a data.table solution then, you could do:

require(data.table)
dt <- data.table(df)
dt[dt[, .I[.N == 1], by = A]$V1]
Arun
  • 116,683
  • 26
  • 284
  • 387