0

I have tried various functions including compare and all.equal but I am having difficulty finding a test to see if variables are the same.

For context, I have a data.frame which in some cases has a duplicate result. I have tried copying the data.frame so I can compare it with itself. I would like to remove the duplicates.

One approach I considered was to look at row A from dataframe 1 and subtract it from row B from dataframe 2. If they equal to zero, I planned to remove one of them.

Is there an approach I can use to do this without copying my data?

Any help would be great, I'm new to R coding.

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • 1
    an example of the content of the first rows of the two tables would be helpful. You could have a look at https://dplyr.tidyverse.org/reference/lead-lag.html – Waldi Jun 17 '20 at 15:21

1 Answers1

0

Suppose I had a data.frame named data:

data
  Col1 Col2
A    1    3
B    2    7
C    2    7
D    2    8
E    4    9
F    5   12

I can use the duplicated function to identify duplicated rows and not select them:

data[!duplicated(data),]
  Col1 Col2
A    1    3
B    2    7
D    2    8
E    4    9
F    5   12

I can also perform the same action on a single column:

data[!duplicated(data$Col1),]
  Col1 Col2
A    1    3
B    2    7
E    4    9
F    5   12

Sample Data

data <- data.frame(Col1 = c(1,2,2,2,4,5), Col2 = c(3,7,7,8,9,12))
rownames(data) <- LETTERS[1:6]
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57