I have a dataset with programme participants, where I would like to remove clients who participated in two (or more) different programmes. I tried to work around this task by selecting clients with distinct IDs, and clients who participated more than once in the same programme, and then hoping to join these two subsets, but I keep failing. Here is my attempt on a small sample:
Client.ID = c(1, 2, 2, 3, 3)
Program = c("P1", "P2", "P2", "P1", "P2")
From = as.Date(c("2016-10-01", "2016-01-25", "2016-07-01", "2016-04-06", "2016-05-01"))
To = as.Date(c("2017-03-31", "2016-06-30", "2016-11-27", "2016-06-03", "2016-06-02"))
df = tibble(Client.ID, Program, From, To)
df
# A tibble: 5 x 4
Client.ID Program From To
<dbl> <chr> <date> <date>
1 1 P1 2016-10-01 2017-03-31
2 2 P2 2016-01-25 2016-06-30
3 2 P2 2016-07-01 2016-11-27
4 3 P1 2016-04-06 2016-06-03
5 3 P2 2016-05-01 2016-06-02
distinct(df, Client.ID, .keep_all = TRUE) #this selects distinct cases nicely
df[duplicated(df$Client.ID) & !duplicated(df$Program),] #but this does not work
I tried to look at other questions on the forum, but I only found ones, eg., here and here, that deal with duplicates based on two variables. I want to do something else: to remove cases that are duplicates in the first variable, but have different values in the second variable. Clients who participated in the same programme multiple times, should stay in the dataset.
Many thanks for any help!