Subset data for specific values in a variable in R

Question

I have a panel dataset in long format that looks something like this:

idpers <- c(1040, 1040, 1041, 1041, 1041, 1232, 1277, 1277, 1277, 1277)
wave <- c(2012, 2013, 2012, 2013, 2014, 2011, 2011, 2012, 2013, 2014)
df <- as.data.frame c(idpers, wave)

where idpers is an interviewee id, and wave is an indicator of on which wave/year the survey was conducted.

I would like to test the effect of a treatment that took place in say 2013. And I want to subset my dataframe for only participants who have both pre and post treatment observations. So I just want to keep each idpers row if there are other rows for that same idpers with values for both before and after/during the 2013 wave. I tried plenty of things like this:

df.ref%>%
  group_by(idpers)%>%
  filter(wave %in% c(2011,2012,2013,2014))

But this keeps any row with wave values on there.

I hope that was clear and I'm happy to give more details! Thanks a lot!

score 3 · Accepted Answer · answered Apr 27 '20 at 10:04

3

I think you are looking for :

library(dplyr)
df %>% group_by(idpers) %>% filter(any(wave < 2013) && any(wave > 2013))

#  idpers  wave
#   <dbl> <dbl>
#1   1041  2012
#2   1041  2013
#3   1041  2014
#4   1277  2011
#5   1277  2012
#6   1277  2013
#7   1277  2014

This will include idpers which will have at least one value before 2013 and one value after.

answered Apr 27 '20 at 10:04

Ronak Shah

377,200
20
156
213

Perfect. It works. Just a follow up, is it the same using & or &&? It works both ways – AntVal Apr 27 '20 at 11:07
1

In this case both `&` and `&&` would work. However, `&` is used for vector operations (meaning for more than 1 element) whereas `&&` only for a scalar. Here is a nice post explaining the difference https://stackoverflow.com/questions/6558921/boolean-operators-and – Ronak Shah Apr 27 '20 at 11:22

Subset data for specific values in a variable in R

1 Answers1