Delete rows conditional on frequency of char variable in R

Question

Let’s say that there are three columns in mydata. There are multiple rows for each ID and their corresponding “case” value (character). I need to count number of a’s for each ID, and if >= 3, then delete the whole ID rows, if not, keep it.

What I have:

mydata <- data.frame(id=c(1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4), case=c("a","b","c","a","a","a","a","c","c","a","a","a","c","a","b","c","a","b"), value=c(1,1,1,1,1,1,1,2,2,2,2,1,1,1,1,2,2,2))

what I need:

    id  case    value
6   2   a   1
7   2   a   1
8   2   c   2
9   2   c   2
14  4   a   1
15  4   b   1
16  4   c   2
17  4   a   2
18  4   b   2

I think this might cover it: http://stackoverflow.com/questions/18302610/remove-ids-that-occur-x-times-r , or this one too: http://stackoverflow.com/questions/24503279/return-df-with-a-columns-values-that-occur-more-then-once?lq=1 — thelatemail, Dec 04 '14 at 22:31
probably not? they want to delete ids based on number of rows for each id. and I want to count number of a's for each id. — user9292, Dec 04 '14 at 22:33

score 1 · Accepted Answer · answered Dec 04 '14 at 22:35

There's probably a host of solutions, but here's one using base R's ave:

mydata[with(mydata, !(ave(case=="a",id,FUN=sum)>=3) ),]

#   id case value
#6   2    a     1
#7   2    a     1
#8   2    c     2
#9   2    c     2
#14  4    a     1
#15  4    b     1
#16  4    c     2
#17  4    a     2
#18  4    b     2

Davide Passaretti · Answer 2 · 2014-12-04T22:55:10.230

0

Something like:

library(dplyr)
pos <- unlist(mydata %>% group_by(id) %>% 
                         tally(case == 'a') %>% filter(n < 3) %>% select(id))

mydata %>% filter(id %in% pos)

should work.

edited Dec 04 '14 at 22:55

answered Dec 04 '14 at 22:39

Davide Passaretti

2,741
1
21
32

You could do that a bit shorter: `group_by(mydata, id) %>% filter(sum(case == "a") < 3)` – talat Dec 05 '14 at 17:57
yes, I was near a simple solution but could not find it. – Davide Passaretti Dec 05 '14 at 18:05

Delete rows conditional on frequency of char variable in R

2 Answers2