0

when calculating N for my 2 questionnaires, they are different. I would like to filter out participants who have completed both questionnaires and not skipped by the second one.

quest_distinct <-  quest_data %>%
  group_by(user_id, q_id) %>%
  filter(session_id == min(session_id), endtime == min(endtime)) %>%
  filter(row_number() == 1) %>%
  ungroup() %>%
  filter(user_status %in% c("guest", "registered"))

this is the code I have used so far to filter out test sessions.

Claire
  • 3
  • 2
  • 1
    Please show a small reproducible exxample with `dput` and the expected output so that it is easier to crosscheck – akrun Feb 08 '21 at 17:55
  • 2
    Hi Laura, welcome to Stack Overflow. Please do not post screenshots of your data. Instead, [edit] your question with the output of `dput(quest_data)` or `dput(head(quest_data))` if your data is very large. You can use three backticks (`) to improve formatting. See [How to make a great R reproducible example](https://stackoverflow.com/a/5963610/) for more. – Ian Campbell Feb 08 '21 at 17:57

1 Answers1

0

With dplyr, you can group_by user_id and use n_distinct on quest_id (or q_id - it is unclear based on previous post of data and code above). This assumes that quest_id is a unique value for each questionnaire. n_distinct(quest_id) will give you the count of unique questionnaires. If that value is 2, then that user has data for 2 different questionnaires, and you can filter to keep that user's data in your output. If you use dput with your sample data as suggested in the comments, we can probably provide additional assistance and demonstrate with your data.

library(dplyr)

quest_data %>%
  group_by(user_id) %>%
  filter(n_distinct(quest_id) == 2)
Ben
  • 28,684
  • 5
  • 23
  • 45