0

I have three variables patients, arm and bestres

01 A CR 
02 A PD 
03 B PR 
04 B CR 
05 C SD 
06 C SD 
07 C PD 
01 A CD 
03 B PD 

I want to remove duplicates in patients and arm

patient   arm   bestres
1         A      CR 
2         A      PD 
3         B      PR 
4         B      CR 
5         C      SD 
6         C      SD 
7         C      PD

How to remove duplicates based on two variables

mpromonet
  • 11,326
  • 43
  • 62
  • 91
suresh
  • 59
  • 5
  • If you Google your last sentence "How to remove dupliactes based on two variables" the first hit has an answer: http://stackoverflow.com/questions/13742446/duplicates-in-multiple-columns – Sam Firke May 19 '16 at 17:37

1 Answers1

0

When passed a data.frame, duplicated() returns TRUE for rows which duplicate prior rows in their entirety. Hence we can achieve your requirement by passing just the target columns to it:

df <- data.frame(patient=c(1L,2L,3L,4L,5L,6L,7L,1L,3L),arm=c('A','A','B','B','C','C','C','A','B'),bestres=c('CR','PD','PR','CR','SD','SD','PD','CD','PD'),stringsAsFactors=F);
df[!duplicated(df[,c('patient','arm')]),];
##   patient arm bestres
## 1       1   A      CR
## 2       2   A      PD
## 3       3   B      PR
## 4       4   B      CR
## 5       5   C      SD
## 6       6   C      SD
## 7       7   C      PD

Beware this warning from the documentation:

The data frame method works by pasting together a character representation of the rows separated by \r, so may be imperfect if the data frame has characters with embedded carriage returns or columns which do not reliably map to characters.

bgoldst
  • 34,190
  • 6
  • 38
  • 64