-1

Data context: People responding to teach other on an online discussion board

Aim: Filter the data based on whether they took turns within the same post and who the partners (dyad) were. Essentially, it boils down to filtering based on the values of other columns.

Specifically, I thought it would start from checking whether 'turntaking'==1, and then keeping observations with the same 'dyad_id' within the same 'post_id'. I'm having trouble how to filter by multiple conditions.

Example data:

structure(list(post_id = c(100, 230, 100, 100, 100, 100), dyad_id = structure(c(2L, 
2L, 2L, 1L, 1L, 1L), .Label = c("42_27", "53_27"), class = "factor"), 
    dyad_id_order = structure(c(4L, 4L, 2L, 3L, 1L, 3L), .Label = c("27_42", 
    "27_53", "42_27", "53_27"), class = "factor"), turntaking = c(0, 
    0, 1, 0, 1, 1)), class = "data.frame", row.names = c(NA, 
-6L), variable.labels = structure(character(0), .Names = character(0)), codepage = 65001L)

The example data look like visually:

╔═════════╦═════════╦═══════════════╦════════════╦══════════════════════════════════════════════════════════╗
║ post_id ║ dyad_id ║ dyad_id_order ║ turntaking ║ (note)                                                   ║
╠═════════╬═════════╬═══════════════╬════════════╬══════════════════════════════════════════════════════════╣
║   100   ║  53_27  ║     53_27     ║      0     ║ Keep                                                     ║
╠═════════╬═════════╬═══════════════╬════════════╬══════════════════════════════════════════════════════════╣
║   230   ║  53_27  ║     53_27     ║      0     ║ Drop                                                     ║
╠═════════╬═════════╬═══════════════╬════════════╬══════════════════════════════════════════════════════════╣
║   100   ║  53_27  ║     27_53     ║      1     ║ Keep: ID27 responded to ID53's response in the first row ║
║         ║         ║               ║            ║ (They are both found under the same post_id)             ║
╠═════════╬═════════╬═══════════════╬════════════╬══════════════════════════════════════════════════════════╣
║   100   ║  42_27  ║     42_27     ║      0     ║ Keep                                                     ║
╠═════════╬═════════╬═══════════════╬════════════╬══════════════════════════════════════════════════════════╣
║   100   ║  42_27  ║     27_42     ║      1     ║ Keep                                                     ║
╠═════════╬═════════╬═══════════════╬════════════╬══════════════════════════════════════════════════════════╣
║   100   ║  42_27  ║     42_27     ║      1     ║ Keep                                                     ║
╚═════════╩═════════╩═══════════════╩════════════╩══════════════════════════════════════════════════════════╝

The final output would look like:

╔═════════╦═════════╦═══════════════╦════════════╗
║ post_id ║ dyad_id ║ dyad_id_order ║ turntaking ║
╠═════════╬═════════╬═══════════════╬════════════╣
║   100   ║  53_27  ║     53_27     ║      0     ║
╠═════════╬═════════╬═══════════════╬════════════╣
║   100   ║  53_27  ║     27_53     ║      1     ║
╠═════════╬═════════╬═══════════════╬════════════╣
║   100   ║  42_27  ║     42_27     ║      0     ║
╠═════════╬═════════╬═══════════════╬════════════╣
║   100   ║  42_27  ║     27_42     ║      1     ║
╠═════════╬═════════╬═══════════════╬════════════╣
║   100   ║  42_27  ║     42_27     ║      1     ║
╚═════════╩═════════╩═══════════════╩════════════╝
user14250906
  • 197
  • 8
  • You may want to refer to this post for the [answer](https://stackoverflow.com/questions/1686569/filter-data-frame-rows-by-a-logical-condition) – nyk Feb 16 '21 at 00:38
  • Thank you @nyk, but I think this is a different question because we need to filter based on the value of multiple other columns, not within the same column. – user14250906 Feb 16 '21 at 00:40
  • It is probably because your filtering criteria hadn't been pinned down. Specifically, you may want to clarify on the requirement "who the partners were". Does it mean each post ID can be tied to a unique dyad_id? But your output doesn't seem that is the case. How do you decide which dyad_id to keep in each post ID? – nyk Feb 16 '21 at 01:01

1 Answers1

1

This looks at each post_id-dyad_id combination and only keeps those which had a turntaking flag at some point.

  my_data %>%
    group_by(post_id, dyad_id) %>%
    filter(sum(turntaking) > 0) %>%
    ungroup()

# A tibble: 5 x 4
  post_id dyad_id dyad_id_order turntaking
    <dbl> <fct>   <fct>              <dbl>
1     100 53_27   53_27                  0
2     100 53_27   27_53                  1
3     100 42_27   42_27                  0
4     100 42_27   27_42                  1
5     100 42_27   42_27                  1
Jon Spring
  • 55,165
  • 4
  • 35
  • 53