2

Im am trying to figure out how to subset my dataset according to the repeated value of the variable s, taking also into account the id associated to the row.

Suppose my dataset is:

dat <- read.table(text = "
        id     s          
        1      2     
        1      2     
        1      1      
        1      3     
        1      3     
        1      3     
        2      3     
        2      3     
        3      2     
        3      2", 
header=TRUE)

What I would like to do is, for each id, to keep only the first row for which s = 3. The result with dat would be:

        id     s          
        1      2     
        1      2     
        1      1      
        1      3         
        2      3         
        3      2     
        3      2

I have tried to use both duplicated() and which() for using subset() in a second moment, but I am not going anywhere. The main problem is that it is not sufficient to isolate the first row of the s = 3 "blocks", because in some cases (as here between id = 1 and id = 2) the 3's overlap between one id and another.. Which strategy would you adopt?

flodel
  • 87,577
  • 21
  • 185
  • 223
Stefano Lombardi
  • 1,581
  • 2
  • 22
  • 48

1 Answers1

2

Like this:

subset(dat, s != 3 | s == 3 & !duplicated(dat)) 
#    id s
# 1   1 2
# 2   1 2
# 3   1 1
# 4   1 3
# 7   2 3
# 9   3 2
# 10  3 2

Note that subset can be dangerous to work with (see Why is `[` better than `subset`?), so the longer but safer version would be:

dat[dat$s != 3 | dat$s == 3 & !duplicated(dat), ]
Community
  • 1
  • 1
flodel
  • 87,577
  • 21
  • 185
  • 223