Delete all rows based on corresponding values in multiple columns

Question

Edited question:

I would like to subset/filter a new dataframe based on multiple conditions. I tried the following code mentioned here (Subset data frame based on multiple conditions) and (Remove group from data.frame if at least one group member meets condition)

A small portion of total database:

df<- structure(list(pat_id = c(10302, 10302, 10302, 
                          10482, 10482,10482,
                          10613, 10613, 10613, 
                          16190, 16190, 16190, 
                          16220, 16220,16220, 16220, 16220, 16220, 16220, 16220), 
               date = c("2014-04-22","2018-12-13", "2020-07-27", "2019-07-15", "2019-09-19", "2019-09-23", 
                         "2015-09-29", "2015-10-06", "2015-11-20", "2013-07-08", "2018-01-30", 
                         "2020-01-09", "2016-06-15", "2018-02-23", "2019-02-14", "2019-08-09", 
                         "2020-03-02", "2020-07-03", "2020-11-09", "2020-12-16"), 
               number = c(1,2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5, 6, 7, 8), 
               col1 = c(0,1, 1, 2, 4, 4, 9, 3, 1, 0, 1, 1, 9, 9, 9, 9, 9, 9, 9, 9), 
               col2 = c(NA_real_,NA_real_, NA_real_, 0, 1, NA_real_, NA_real_, NA_real_, 
                        NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                        NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), 
               col3 = c(NA_real_,NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                        NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
                        NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)), 
                class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
               pat_id = c(10302, 10482, 10613, 16190, 16220), .rows = structure(list(
                        1:3, 4:6, 7:9, 10:12, 13:20), ptype = integer(0), class = c("vctrs_list_of", 
                        "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
                        ), row.names = c(NA, -5L), .drop = TRUE))

I want to create a new dataframe based on the following conditions.

If the number is 1 or 2 AND col1, col2 or col3 is 1, then delete all the rows with the corresponding id value.

Desired output:

id        date    number    col1     col2     col3
10613      ..      1         9        NA       NA
10613      ..      2         3        NA       NA
10613      ..      3         1        NA       NA
etc

I've tried df1 <- df %>% group_by(pat_id) %>% filter(any(!(number <= 2 & (col1 == 1 | col2==1 | col3==1))))

But this does not seem to work. Could it be because of the class/structure of the dataframe? I cant figure it out. If i create a 'dummy' dataframe with similar columns this code does work. But not on the big dataset.

Any tips?

You mention "multiple columns" but in your example condition is on only one "number" column, is this correct data? — zx8754, Aug 16 '23 at 09:01

ThomasIsCoding · Answer 1 · 2023-08-16T08:53:45.907

4

A base option with subset + ave

> subset(df, ave(number == 1, id) == 0)
  id number
6 12      2
7 13      2
8 13      3

Idea Behind

If you don't specify the function to be used in ave, by default ave will calculate the mean value of the first argument (grouped by the following arguments). In your case, if there is no 1 in the column number, the mean should be 0.

edited Aug 16 '23 at 08:53

answered Aug 16 '23 at 08:31

ThomasIsCoding

96,636
9
24
81

score 3 · Accepted Answer · answered Aug 16 '23 at 08:24

First of all, make sure your number columns are numeric. After that you can group_by per id and filter if all numbers are true based on your condition like this:

library(dplyr)

df %>%
  group_by(id) %>%
  filter(all(number > 1))
#> # A tibble: 3 × 2
#> # Groups:   id [2]
#>   id    number
#>   <chr>  <dbl>
#> 1 12         2
#> 2 13         2
#> 3 13         3

^{Created on 2023-08-16 with reprex v2.0.2}

Data used:

id <- c('10','10','10','11', '11', '12', '13', '13', '14', '15', '15')
number <- c(1, 2,3, 1, 2, 2, 2, 3,1 ,1,2)
df <- data.frame(id, number)

Delete all rows based on corresponding values in multiple columns

2 Answers2

Idea Behind