1

I have a set of data:

x <- c(rep("A", 3), rep("B", 3), rep("C",2))
y <- c(1,1,2,4,1,1,2,2)
z <- c(rep("E", 1), rep("F", 4), rep("G",3))
df <-data.frame(x,y,z)

I only want to remove the duplicate row if both column x and column z are duplicated. In this case, after applying the code, row 2,3 will left with 1 row, row 4,5 will left with 1 row, row 7,8 will left with 1 row How to do it?

Chris
  • 11
  • 3

1 Answers1

0

You can use a simple condition to subset your data:

x <- c(rep("A", 3), rep("B", 3), rep("C",2))
y <- c(1,1,2,4,1,1,2,2)
z <- c(rep("A", 1), rep("B", 4), rep("C",3))
df <-data.frame(x,y,z)

df
df[!df$x == df$z,] # the ! excludes all rows for which x == z is TRUE

  x y z
2 A 1 B
3 A 2 B
6 B 1 C

Edit: As @RonakShah commented, to exclude duplicated rows, use

df[!duplicated(df[c("x", "z")]),]

or

df[!duplicated(df[c(1, 3)]),]

  x y z
1 A 1 A
2 A 1 B
4 B 4 B
6 B 1 C
7 C 2 C
LAP
  • 6,605
  • 2
  • 15
  • 28
  • sorry, I phrase my question wrongly, please refer to the edited question again. Thank you. – Chris Feb 08 '19 at 07:31