-3

This is my df = myproject

myproject <- data.frame(
  Participant                = 1:5,
  `futuremw:1`               = c(1L, 2L, 1L, 1L, NA),
  `pastmw:1`                 = c(1L, 1L, 2L, 1L, NA),
  `proportionfuturepast:1`   = c(4L, 7L, 1L, 2L, NA),
  my_video_item_duration_min = c(5, 1, 7.02, 6, 6),
  check.names = FALSE
)

I want to exclude participants whose value "my_video_item_duration_min" is less than 5 and greater than 7. To do so I apply this dplyr code:

myproject_filtered = myproject %>%
filter(my_video_item_duration_min > 5) %>% 
filter(my_video_item_duration_min < 7)

Now I want to exclude a participant everytime that futuremw:1 is different from 2 and pastmw:1 is equal to 1 and proportionfuturepast is different from 3 so that I want the participant 4 to be excluded because all the three exclusion criteria at the same time are met. If only 1 or 2 exclusion criteria is met but not the other then the participant is not excluded. Furthermore I want to keep included participant n. 5, even though it presents NA values

I've tried this

myproject_filtered = myproject %>%
filter(my_video_item_duration_min > 5) %>%
filter(my_video_item_duration_min < 7) %>%
filter(futuremw_1 != 2 | pastmw_1 == 1 | proportionfuturepast_1 != 3) 

I have used the code proposed in the answer and it works . However, I now want to combine different exclusion criteria. The following code does not work:

    myproject_excluding_participants = myproject %>%
  filter (
    my_video_item_duration_min >= 5,
    my_video_item_duration_min <= 7,
    ! complete.cases(.) | mind_wandering_1 != 1 | proportionMW_1 != 11,
    ! complete.cases(.) | mind_wandering_1 != 2 | proportionMW_1 == 11,
    ! complete.cases(.) | futuremw_1 != 2 | pastmw_1 != 2 | proportionfuturepast_1 == 4,
    ! complete.cases(.) | futuremw_1 != 1 | pastmw_1 != 2 | proportionfuturepast_1 == 1,
    ! complete.cases(.) | futuremw_1 != 2 | pastmw_1 != 1 | proportionfuturepast_1 == 7,
    ! complete.cases(.) | futuremw_1 != 1 | pastmw_1 != 1 | proportionfuturepast_1 != 1,
    ! complete.cases(.) | futuremw_1 != 1 | pastmw_1 != 1 | proportionfuturepast_1 != 7,
    ! complete.cases(.) | ED_1 != 1 | proportionED_1 != 11
    ! complete.cases(.) | ED_1 != 2 | proportionED_1 != 11,
    ! complete.cases(.) | proportionfuturepast_dailylife_1 != 1 | futureMW_dailylife_1 != 5,
    ! complete.cases(.) | proportionfuturepast_dailylife_1 != 1 | pastMW_dailylife_1 == 5,
    ! complete.cases(.) | proportionfuturepast_dailylife_1 != 7 | pastMW_dailylife_1 != 5,
    ! complete.cases(.) | proportionfuturepast_dailylife_1 != 7 | futureMW_dailylife_1 == 5,
    ! complete.cases(.) | futureMW_dailylife_1 != 5 | pastMW_dailylife_1 != 5 | proportionfuturepast_dailylife_1 == 4,
    ! complete.cases(.) | CurrentConcernsAreas_14 != 1 | SumCurrentConcernsAreas1to13 < 0
    )
Gianluca
  • 43
  • 1
  • 9
  • `futuremw:1` is not a legal variable name: the colon suggests this is a sequence. Do you mean instead `\`futuremw:1\`` (notice the additional backticks)? – r2evans Sep 11 '17 at 16:26
  • 2
    Instead of an *image* of your data, please insert a sample of your data itself, making this question a little more reproducible. There are [good examples](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) of how to do this for easy consumption. (Furthermore, your image includes variables names `\`pastmw:1\`` but your code showed `pastmw_1 == 3`, please be consistent in what you have and what code you use.) – r2evans Sep 11 '17 at 16:29
  • So your intended resulting dataset is empty? The "duration" filters keep 4-5. Participant 5 is likely going away because you don't say how to deal with `NA`. And 4 is excluded due to your other rule. – r2evans Sep 11 '17 at 16:41
  • Not all. Participants 1 should not be excluded as he had a duration filter of 5 and not for example 4.9. Participant 5 should not be excluded because NA I want to include those presenting NA. – Gianluca Sep 11 '17 at 20:49
  • You use `> 5`, which excludes participant 1. – r2evans Sep 11 '17 at 20:51
  • Just edit your question to reflect the non-strict inequality. With that out of the way, does the answer work? – r2evans Sep 11 '17 at 20:56
  • what about including this presenting NA? – Gianluca Sep 11 '17 at 21:02
  • Your first criteria (`>= 5` and `< 7`) drops participants 2 and 3. Your second triplet-criteria drops 1 and 4. The only participant remaining is 5. If this is not correct, then either the logic or your communication of it is incorrect. – r2evans Sep 11 '17 at 22:53
  • One thing I think you are mistaking is the purpose of `filter`: a `TRUE` tells it to *keep* a row, not *exclude* it, so your logic of `futuremw_1 != 2 | ...` means to *keep* rows that meet those combined criteria. I think you mean to reverse your logic with `! (futuremw_1 != 2 & pastmw_1 == 1 & proportionfuturepast_1 != 3)` which is logically equivalent to `futuremw_1 == 2 | pastmw_1 != 1 | proportionfuturepast_1 == 3` (wrong variable names notwithstanding). – r2evans Sep 11 '17 at 23:14

1 Answers1

1

How about inverting your logic to define what to keep vice what to exclude:

library(dplyr)
myproject %>%
  filter(
    my_video_item_duration_min >= 5,
    my_video_item_duration_min < 7,
    ! complete.cases(.) | `futuremw:1` == 2 | `pastmw:1` != 1 | `proportionfuturepast:1` == 3
  )

Your data:

myproject <- data.frame(
  Participant                = 1:5,
  `futuremw:1`               = c(1L, 2L, 1L, 1L, NA),
  `pastmw:1`                 = c(1L, 1L, 2L, 1L, NA),
  `proportionfuturepast:1`   = c(4L, 7L, 1L, 2L, NA),
  my_video_item_duration_min = c(5, 1, 7.02, 6, 6),
  check.names = FALSE
)
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Maybe worth explaining: `x|y|z` may evaluate to NA, in which case `filter` treats it as FALSE, eg `mtcars %>% slice(1:3) %>% filter(c(TRUE, FALSE, NA))` – Frank Sep 11 '17 at 21:01
  • it doesn't work as no participant is included . This is the output: [1] Participant futuremw:1 pastmw:1 [4] proportionfuturepast:1 my_video_item_duration_min <0 rows> (or 0-length row.names) – Gianluca Sep 11 '17 at 22:50