1

I have a panel data frame like this:

test <- data.frame(id = c("A", "A", "A", "B", "B"), year = c("2014", "2015", "2016", "2014", "2015"), income = c("100", "150", "200", "300", "200"))

Suppose 2015 is when the "treatment" took place.

Now, I would like to keep only those observations from which data are available both before and after 2015. In other words, my expected data frame will exclude all B's. Any input would be highly appreciated. Thanks.

Anup
  • 239
  • 2
  • 11
  • 2
    Not sure this is a duplicate. It's a bit more involved than just a row-by-row selection. `test %>% group_by(id) %>% filter(any(year < 2015) & any(year > 2015))` was the best I could come up with - @MauritsEvers – thelatemail Sep 07 '22 at 03:50
  • I think this is a dupe of [Filter data.frame rows by a logical condition](https://stackoverflow.com/questions/1686569/filter-data-frame-rows-by-a-logical-condition); had closed this earlier but then reopened as not sure. In your case, this is `library(dplyr); test %>% group_by(id) %>% filter(sum(year %in% c(2014, 2016)) == 2)`. – Maurits Evers Sep 07 '22 at 03:52
  • 1
    @thelatemail Yup I agree with you; too trigger-happy and re-opened. – Maurits Evers Sep 07 '22 at 03:53
  • I think `library(dplyr); test %>% group_by(id) %>% filter(all(year == c("2014", "2015", "2016")))` could be another option – jared_mamrot Sep 07 '22 at 03:59

0 Answers0