0

Sample Data

data =data.frame(id=c(1,1,1,2,2,2,3,3,3,4,4,4),
                 score=c(5,7,6,9,8,4,NA,11,3,7,NA,10))

So in this example if id has any scores equal to 7 than I want to remove those ids to get a new data frame such as:

data2 =data.frame(id=c(2,2,2,3,3,3),
                     score=c(9,8,4,NA,11,3))

I tried data[data$score != 7,] but this only works for a row and not the group.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
bvowe
  • 3,004
  • 3
  • 16
  • 33

6 Answers6

3

Use subset keeping any group for which !any(x == 7, na.rm = TRUE) is TRUE. This one-liner uses only base R.

subset(data, !ave(score, id, FUN = function(x) any(x == 7, na.rm = TRUE)))

giving:

  id score
4  2     9
5  2     8
6  2     4
7  3    NA
8  3    11
9  3     3
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
2

If you would like a solution that doesn't require any packages, you could try:

data[!(data$id %in% data$id[data$score == 7]) , ]


  id score
4  2     9
5  2     8
6  2     4
7  3    NA
8  3    11
9  3     3

To explain a bit, the data$id[data$score == 7] bit finds the id when the score is 7. We then use %in% to find a logical vector when the id in our original data frame is one of those things with data$id %in% data$id[data$score == 7]. And then we surround that with ! to remove those ids.

Perhaps a comically high level of overkill here, but we can benchmark all of the options presented so far:

library(dplyr)
library(microbenchmark)

microbenchmark(`G. Grothendieck` = subset(data, !ave(score, id, FUN = function(x) any(x == 7, na.rm = TRUE))), 
           `Nick Criswell` = data[!(data$id %in% data$id[data$score == 7]) , ],
           divibisan = data %>%
             group_by(id) %>%
             filter(!(7 %in% score)),
           arg0naut = data %>%
             anti_join(data %>% filter(score == 7), by = "id"),
           tmfmnk = data %>%
             group_by(id) %>%
             filter(!any(score == 7, na.rm = TRUE)),
           `d.b` = data[!data$id %in% split(data$id, data$score)$`7`,])


     Unit: microseconds
            expr     min      lq     mean   median       uq       max neval
 G. Grothendieck 160.001 177.455 189.4648 185.4545 195.6370   269.576   100
   Nick Criswell  37.819  45.091  52.2820  53.8190  57.2130    93.576   100
       divibisan 443.636 456.000 480.1211 464.0000 489.4545   904.726   100
        arg0naut 733.091 757.818 806.7143 766.0600 805.3325  1543.755   100
          tmfmnk 444.121 457.939 704.8916 463.0300 479.5150 22332.079   100
             d.b 103.759 114.424 125.3291 122.1825 131.8800   202.182   100
Nick Criswell
  • 1,733
  • 2
  • 16
  • 32
1

In dplyr we can group_by and filter within each group for the presence (or lack, since we invert with !) of a 7 in the score variable:

library(dplyr)
data %>%
    group_by(id) %>%
    filter(!(7 %in% score))

# A tibble: 6 x 2
# Groups:   id [2]
     id score
  <dbl> <dbl>
1     2     9
2     2     8
3     2     4
4     3    NA
5     3    11
6     3     3
divibisan
  • 11,659
  • 11
  • 40
  • 58
1

Another dplyr possibility:

data %>%
 group_by(id) %>%
 filter(!any(score == 7, na.rm = TRUE))

     id score
  <dbl> <dbl>
1     2     9
2     2     8
3     2     4
4     3    NA
5     3    11
6     3     3

Or:

data %>%
 group_by(id) %>%
 filter(!any(cumsum(ifelse(is.na(score), 0, score) == 7) >= 1))

Or the same with base:

data[!ave(data$score, data$id, FUN = function(x) any(cumsum(ifelse(is.na(x), 0, x) == 7) >= 1)), ]

  id score
4  2     9
5  2     8
6  2     4
7  3    NA
8  3    11
9  3     3

Or a possibility similar to @G. Grothendieck, but without subset():

data[!ave(data$score, data$id, FUN = function(x) any(x == 7, na.rm = TRUE)), ]
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
1
data[!data$id %in% split(data$id, data$score)$`7`,]
#  id score
#4  2     9
#5  2     8
#6  2     4
#7  3    NA
#8  3    11
#9  3     3
d.b
  • 32,245
  • 6
  • 36
  • 77
0

We could do:

library(dplyr)

data %>%
  anti_join(data %>% filter(score == 7), by = "id")

Output:

  id score
1  2     9
2  2     8
3  2     4
4  3    NA
5  3    11
6  3     3
arg0naut91
  • 14,574
  • 2
  • 17
  • 38