1

I am currently working on a dataset which contains participants who are followed for a year and perform a physical test on different days about once a month. These dates could vary for each individual. In between they fill out psychological questions twice a day and I would like to analyse the data in between the physical tests. Therefore, I would like to filter all rows between the first and the last test day for every individual, and preserve all psychological data in between.

My current approach is based upon this answer: How to filter rows in dataframe that are not within a certain timeframe of an event value in another row?

library(dplyr)

# Simplified dataset

set.seed(1)
day_count <- c(1:8,12:20,14:26)
date <- as.Date(c(1:8,12:20,14:26), origin = Sys.Date())
id <- c(rep("A",9),rep("B",9),rep("C",12))
mood <- c(sample(1:100, 9),sample(1:100, 9),sample(1:100, 12))
ISRT <- c(c(NA,100,NA,NA,NA,NA,NA,90,NA),
        c(NA,NA,70,NA,NA,NA,80,NA,NA),
        c(90,NA,NA,100,NA,NA,50,NA,NA,NA,10,NA))

dat <- data_frame(day_count,date, id, mood, ISRT)

dat <-  dat %>% mutate(test_day = !is.na(ISRT))

dat_between_tests <- dat %>%
  mutate(date = as.Date(date, format="%Y-%m-%d")) %>%   
  group_by(id) %>%
      filter(Reduce(`|`, purrr::map(date[test_day == TRUE],
                                ~dplyr::between(date, .x -1  , .x + 1))))

I have included one day before and after the test day because otherwise this approach does not work (which ideally I would like to). In this simplified example, this approach seems to work. But when I run this on my own dataset I receive the following error:

Error:
! Problem with `filter()` input `..1`.
ℹ Input `..1` is `Reduce(...)`.
✖ Input `..1` must be of size 172 or 1, not size 0.
ℹ The error occurred in group 4: id = "1cf91d6c2f7ddfbd68b93dbc04a4c667".

Does anyone know what causes this and how I can resolve this error? Could it have something to do with the occurrence of multiple test days throughout the study period?

Jur
  • 13
  • 2
  • It's hard to debug the error without seeing what data is in that group but maybe try filter where `id = "1cf91d6c2f7ddfbd68b93dbc04a4c667"` and see if there is anything unusual in the formatting of that `id`'s data? – Stuart Demmer May 17 '22 at 07:48
  • Thank for your answer! At first I thought the problem was that id was formatted as a character. The error still exists when I recoded the variable in to factors. When looking at the data, I think that the problem might be caused that there is no test day for this specific person. This would mean that I do not need to include this participant in the desired dataset. Is there a way I can define that in my filter? – Jur May 17 '22 at 09:14
  • See my suggested answer below. It also seems like this person may have many (172?) mood readings. Might be worth confirming whether they really did not have any test days before simply excluding them? – Stuart Demmer May 17 '22 at 09:27

1 Answers1

0

To exclude that particular id from your dataset you could try:

dat_between_tests <- dat %>%
  mutate(date = as.Date(date, format="%Y-%m-%d")) %>%   
  filter(id != "1cf91d6c2f7ddfbd68b93dbc04a4c667") %>% # this should exclude the id with no test days
  group_by(id) %>%
  filter(Reduce(`|`, 
                purrr::map(date[test_day == TRUE], 
                           ~dplyr::between(date, .x -1, .x + 1))))
Stuart Demmer
  • 196
  • 1
  • 5
  • Thank you for the answer. There appeared to be three participants with no test data. I filtered them out based on your solution and this seemed to work: ``` test_days <- dat[which(dat$test_day==T),] ids_test_players <- unique(test_days$id) dat_between_tests <- dat %>% mutate(date = as.Date(date, format="%Y-%m-%d")) %>% filter(id %in% ids_test_players) %>% group_by(id) %>% filter(Reduce(`|`, purrr::map(date[test_day == TRUE], ~dplyr::between(date, .x -1 , .x + 1)))) ``` – Jur May 17 '22 at 11:32