1

Let’s say that there are three columns in mydata. There are multiple rows for each ID and their corresponding “case” value (character). I need to count number of a’s for each ID, and if >= 3, then delete the whole ID rows, if not, keep it.

What I have:

mydata <- data.frame(id=c(1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4), case=c("a","b","c","a","a","a","a","c","c","a","a","a","c","a","b","c","a","b"), value=c(1,1,1,1,1,1,1,2,2,2,2,1,1,1,1,2,2,2))

what I need:

    id  case    value
6   2   a   1
7   2   a   1
8   2   c   2
9   2   c   2
14  4   a   1
15  4   b   1
16  4   c   2
17  4   a   2
18  4   b   2
Jose Luis
  • 3,307
  • 3
  • 36
  • 53
user9292
  • 1,125
  • 2
  • 12
  • 25
  • I think this might cover it: http://stackoverflow.com/questions/18302610/remove-ids-that-occur-x-times-r , or this one too: http://stackoverflow.com/questions/24503279/return-df-with-a-columns-values-that-occur-more-then-once?lq=1 – thelatemail Dec 04 '14 at 22:31
  • probably not? they want to delete ids based on number of rows for each id. and I want to count number of a's for each id. – user9292 Dec 04 '14 at 22:33
  • ok... fair enough, I'll see if I can whip something up. – thelatemail Dec 04 '14 at 22:34

2 Answers2

1

There's probably a host of solutions, but here's one using base R's ave:

mydata[with(mydata, !(ave(case=="a",id,FUN=sum)>=3) ),]

#   id case value
#6   2    a     1
#7   2    a     1
#8   2    c     2
#9   2    c     2
#14  4    a     1
#15  4    b     1
#16  4    c     2
#17  4    a     2
#18  4    b     2
thelatemail
  • 91,185
  • 12
  • 128
  • 188
0

Something like:

library(dplyr)
pos <- unlist(mydata %>% group_by(id) %>% 
                         tally(case == 'a') %>% filter(n < 3) %>% select(id))

mydata %>% filter(id %in% pos)

should work.

Davide Passaretti
  • 2,741
  • 1
  • 21
  • 32