How do you keep only paired rows in a dataframe?

Question

I have a dataset which looks like this:

Names     Subject Trial
A0100_1   A0100   1
A0100_2   A0100   2
A0102_1   A0102   1
A0103_1   A0103   1
A0103_2   A0103   2

I want to keep only the rows of the people with both trials 1 and 2. Thanks in advance!

Hi, what are in trial 1 and trial 2 ; 0 and 1 ? It is rather hard to get what you are asking for here. Could you produce a small reproducible example ? — Félix Cuneo, Oct 11 '19 at 12:36

score 0 · Accepted Answer · answered Oct 11 '19 at 12:42

here is one possibility using the tidyverse package:

library(tidyverse)

mydata <- data.frame(subject = c("A0100", "A0100", "A0102", "A0103", "A0103"),
                     Trial = c(1,2,1,1,2))

mydata %>% 
  mutate(dummy = 1) %>%
  spread(Trial, dummy) %>%
  filter(`1` == `2`) %>%
  gather(trial, dummy, - subject) %>%
  select(-dummy)

  subject trial
  <chr>   <chr>
1 A0100   1    
2 A0103   1    
3 A0100   2    
4 A0103   2

Alternatively (and a bit shorter) you can use the count function and then do a semi join:

mydata %>% 
  count(subject) %>%
  filter(n == 2) %>%
  semi_join(mydata, ., by = "subject")

# A tibble: 4 x 2
  subject Trial
  <chr>   <dbl>
1 A0100       1
2 A0100       2
3 A0103       1
4 A0103       2

How do you keep only paired rows in a dataframe?

1 Answers1