0

My dummy data looks like this:

> head(dummy)
            C1          C2
[1,]         1           1
[2,]         1           2
[3,]         1           3
[4,]         2           3
[5,]         2           4
[6,]         2           5

Value 3 is duplicated in C2, but those lines are unique in data frame. I want to remove all duplicates according C2 and keep only first/last occurrence according C1.

Example of what I want:

> remove duplicates leave first in C1
            C1          C2
[1,]         1           1
[2,]         1           2
[3,]         1           3
[5,]         2           4
[6,]         2           5
# filtered    [4,]   2    3

Or

> remove duplicates leave first in C1
            C1          C2
[1,]         1           1
[2,]         1           2
[4,]         2           3
[5,]         2           4
[6,]         2           5
# filtered   [3,]   1    3
Jaap
  • 81,064
  • 34
  • 182
  • 193
pogibas
  • 27,303
  • 19
  • 84
  • 117

1 Answers1

1

if dat is the dataset

dat[with(dat, !duplicated(C2)),]
 C1 C2
1  1  1
2  1  2
3  1  3
5  2  4
6  2  5


dat[with(dat, !duplicated(C2,fromLast=TRUE)),]
akrun
  • 874,273
  • 37
  • 540
  • 662