0

I am looking for a way to subset a data frame by multiple conditions. My data set is comprised of respondents with at least one and a maximum of 4 children who are either biological children or stepchildren, which I am looking to filter for each child respectively.

Thus, I am currently filter the data like so: SUBSET03 <- subset( SUBSET02, (ehc9k1 == 1 | ehc9k1 == 3) | (ehc9k1 == 1 | ehc9k1 == 3) & (ehc9k2 == 1 | ehc9k2 == 3) | (ehc9k1 == 1 | ehc9k1 == 3) & (ehc9k2 == 1 | ehc9k2 == 3) & (ehc9k3 == 1 | ehc9k3 == 3) | (ehc9k1 == 1 | ehc9k1 == 3) & (ehc9k2 == 1 | ehc9k2 == 3) & (ehc9k3 == 1 | ehc9k3 == 3) & (ehc9k4 == 1 | ehc9k4 == 3) ) #S

This doesn't work out - any tip on how to 'replace' the brackets?

  • 3
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Apr 20 '23 at 19:47

1 Answers1

0

To simplify your logic, be aware that

(ehc9k1 == 1 | ehc9k1 == 3) can translate to ehc9k1 %in% c(1, 3). assuming the variable ehc9k1 measures the number of children,

Your logic is incorrect. if you want to have a number between 1 and 3, you would have to use either ehc9k1 %in% c(1, 2, 3) or dplyr::between(ehc9k1, 1, 3)

which is shorter, reduces the number of brackets

Otherwise, I like to use dplyr::filter() with if_any() or if_all() https://dplyr.tidyverse.org/reference/filter.html https://dplyr.tidyverse.org/reference/across.html

library(dplyr)

SUBSET03 <- SUBSET02 %>% 
  dplyr::filter(
    # if_any will return rows that match the condition at least once 
    # if_all will return rows that match the condition for all column
    if_all(.cols = c(ehc9k1, ehc9k2) , \(x) x %in% c(1, 2, 3))
  )


SUBSET03 <- SUBSET02 %>% 
  filter(
    if_any(.cols = c(starts_with("ehc9k") , \(x) x %in% c(1, 2,3))
  )

You can use select helpers to capture multiple columns names if they have a common pattern https://dplyr.tidyverse.org/reference/dplyr_tidy_select.html

It would help if you provided a working example if I didn't answer your question correctly.

olivroy
  • 548
  • 3
  • 13