remove duplicate base on 2 columns of data

Question

I have a set of data:

x <- c(rep("A", 3), rep("B", 3), rep("C",2))
y <- c(1,1,2,4,1,1,2,2)
z <- c(rep("E", 1), rep("F", 4), rep("G",3))
df <-data.frame(x,y,z)

I only want to remove the duplicate row if both column x and column z are duplicated. In this case, after applying the code, row 2,3 will left with 1 row, row 4,5 will left with 1 row, row 7,8 will left with 1 row How to do it?

Don't use `c` as a name for an object, as `c()` is a base function. — LAP, Feb 08 '19 at 07:17

LAP · Answer 1 · 2019-02-08T07:58:46.230

0

You can use a simple condition to subset your data:

x <- c(rep("A", 3), rep("B", 3), rep("C",2))
y <- c(1,1,2,4,1,1,2,2)
z <- c(rep("A", 1), rep("B", 4), rep("C",3))
df <-data.frame(x,y,z)

df
df[!df$x == df$z,] # the ! excludes all rows for which x == z is TRUE

  x y z
2 A 1 B
3 A 2 B
6 B 1 C

Edit: As @RonakShah commented, to exclude duplicated rows, use

df[!duplicated(df[c("x", "z")]),]

or

df[!duplicated(df[c(1, 3)]),]

  x y z
1 A 1 A
2 A 1 B
4 B 4 B
6 B 1 C
7 C 2 C

edited Feb 08 '19 at 07:58

answered Feb 08 '19 at 07:20

LAP

6,605
2
15
28

sorry, I phrase my question wrongly, please refer to the edited question again. Thank you. – Chris Feb 08 '19 at 07:31

remove duplicate base on 2 columns of data

1 Answers1