0

I have a panel dataset in long format that looks something like this:

idpers <- c(1040, 1040, 1041, 1041, 1041, 1232, 1277, 1277, 1277, 1277)
wave <- c(2012, 2013, 2012, 2013, 2014, 2011, 2011, 2012, 2013, 2014)
df <- as.data.frame c(idpers, wave) 

where idpers is an interviewee id, and wave is an indicator of on which wave/year the survey was conducted.

I would like to test the effect of a treatment that took place in say 2013. And I want to subset my dataframe for only participants who have both pre and post treatment observations. So I just want to keep each idpers row if there are other rows for that same idpers with values for both before and after/during the 2013 wave. I tried plenty of things like this:

df.ref%>%
  group_by(idpers)%>%
  filter(wave %in% c(2011,2012,2013,2014))

But this keeps any row with wave values on there.

I hope that was clear and I'm happy to give more details! Thanks a lot!

AntVal
  • 583
  • 3
  • 18

1 Answers1

3

I think you are looking for :

library(dplyr)
df %>% group_by(idpers) %>% filter(any(wave < 2013) && any(wave > 2013))

#  idpers  wave
#   <dbl> <dbl>
#1   1041  2012
#2   1041  2013
#3   1041  2014
#4   1277  2011
#5   1277  2012
#6   1277  2013
#7   1277  2014

This will include idpers which will have at least one value before 2013 and one value after.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Perfect. It works. Just a follow up, is it the same using & or &&? It works both ways – AntVal Apr 27 '20 at 11:07
  • 1
    In this case both `&` and `&&` would work. However, `&` is used for vector operations (meaning for more than 1 element) whereas `&&` only for a scalar. Here is a nice post explaining the difference https://stackoverflow.com/questions/6558921/boolean-operators-and – Ronak Shah Apr 27 '20 at 11:22