-1

I have a dataset with multiple date-variables and want to create subsets, where I can filter out certain rows by defining the wanted date of the date-variables. To be more precise: Each row in the dataset represents a patient case in a psychiatry and contains all the applied seclusions. So for each case there is either no seclusion, or they are documented as seclusion_date1, seclusion_date2..., seclusion_enddate1, seclusion_enddate2...(depending on how many seclusions were happening). My plan is to create a subset with only those cases, where there is either no seclusion documented or the seclusion_date1 (first seclusion) is after 2019-06-30 and all the possible seclusion_enddates (1, 2, 3....) are before 2020-05-01. Cases with seclusions happening before 2019-06-30 and after 2020-05-01 would be excluded.

I'm very new in the R language so my tries are possibly very wrong. I appreciate any help or ideas.

I tried it with the subset function in R. To filter all possible seclusion_enddates at once, I tried to use starts_with and I tried writing a loop.

all_seclusion_enddates <- function() { c(WMdata, any_of(c("seclusion_enddate")), starts_with("seclusion_enddate")) } Error: any_of()` must be used within a selecting function.

and then my plan would have been: cohort_2_before <- subset(WMdata, seclusion_date1 >= "2019-07-01" & all_seclusion_enddates <= "2020-04-30")

loop: for(i in 1:53) { cohort_2_before <- subset(WMdata, seclusion_date1 >= "2019-07-01" & ((paste0("seclusion_enddate", i))) <= "2020-04-30" & restraint_date1 >= "2019-07-01" & ((paste0('seclusion_enddate', i))) <= "2020-04-30") } Result: A subset with 0 obs. was created.

Iokaste
  • 3
  • 2
  • Can you make your post [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and provide your dataset using `dput(WMdata)`? – jrcalabrese Jan 06 '23 at 15:40
  • Does this answer your question? [R: How to filter/subset a sequence of dates](https://stackoverflow.com/questions/28335715/r-how-to-filter-subset-a-sequence-of-dates) – divibisan Jan 06 '23 at 16:41
  • You can format code more prettily by enclosing it in triple backticks (see https://stackoverflow.com/help/formatting) – Christian Severin Jan 13 '23 at 12:52

1 Answers1

0

Since you don't provide a reproducible example, I can't see your specific problem, but I can help with the core issue.

any_of, starts_with and the like are functions used by the tidyverse set of packages to select columns within their functions. They can only be used within tidyverse selector functions to control their behavior, which is why you got that error. They probably are the tools I'd use to solve this problem, though, so here's how you can use them:

Starting with the default dataset iris, we use the filter_at function from dplyr (enter ?filter_at in the R console to read the help). This function filters (selects specific rows) from a data.frame (given to the .tbl argument) based on a criteria (given to .vars_predicate argument) which is applied to specific columns based on selectors given to the .vars argument.

library(dplyr)

iris %>%
    filter_at(vars(starts_with('Sepal')), all_vars(.>4))

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.7         4.4          1.5         0.4  setosa
2          5.2         4.1          1.5         0.1  setosa
3          5.5         4.2          1.4         0.2  setosa

In this example, we take the dataframe iris, pass it into filter_at with the %>% pipe command, then tell it to look only in columns which start with 'Sepal', then tell it to select rows where all the selected columns match the given condition: value > 4. If we wanted rows where any column matched the condition, we could use any_vars(.>4).

You can add multiple conditions by piping it into other filter functions:

iris %>%
    filter_at(vars(starts_with('Sepal')), all_vars(.>4)) %>%
    filter(Petal.Width > 0.3)

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.7         4.4          1.5         0.4  setosa

Here we filter the previous result again to get rows that also have Petal.Width > 0.3

In your case, you'd want to make sure your date values are formatted as date (with as.Date), then filter on seclusion_date1 and vars(starts_with('secusion_enddate'))

divibisan
  • 11,659
  • 11
  • 40
  • 58
  • Thank you! I tried it like this: cohort2_before %>% filter_at(vars(as.Date(starts_with("seclusion_enddate")) <= "2020-04-30")) %>% filter(as.Date(seclusion_date1) >= "2019-07-01"). I get this error: Error in `filter_at()`: ! Can't subset columns with `as.Date(starts_with("seclusion_enddate")) <= "2020-04-30"`. x `as.Date(starts_with("seclusion_enddate")) <= "2020-04-30"` must be numeric or character, not a logical vector. Do you know what I can do here? Sorry for not making my question clearer or reproducible. Still very new here and I'm struggling a bit with it. – Iokaste Jan 09 '23 at 11:34
  • Take a look at my example code again. `filter_at` wants 2 arguments. The first is the selector function (with `vars` and `starts_with`) that picks the variables to filter on, while the second is the function to use to filter (where you'd compare a date to the date value). You seem to have mixed them all together. You should practice on a simpler dataset, like `iris` to figure out how to use filter. – divibisan Jan 10 '23 at 18:44