I want to identify the unique people who get an apple in a defined timeframe. I did this by creating a binary indicator "apples" as follows.
names<-c("tom", "mary", "tom", "john", "mary", "tom", "john", "mary", "john", "mary", "tom", "mary", "john", "john")
dates<-as.Date(c("2010-02-01", "2010-05-01", "2010-03-01", "2010-07-01", "2010-07-01", "2010-06-01", "2010-09-01", "2010-07-01", "2010-11-01", "2010-09-01", "2010-08-01", "2010-11-01", "2010-12-01", "2011-01-01"))
fruit<-as.character(c("apple", "orange", "banana", "kiwi", "apple", "apple", "apple", "orange", "banana", "apple", "kiwi", "apple", "orange", "apple"))
age<-as.numeric(c(60,55,60,57,55,60,57,55,57,55,60,55, 57,57))
sex<-as.character(c("m","f","m","m","f","m","m", "f","m","f","m","f","m", "m"))
df<-data.frame(names,dates, age, sex, fruit)
df
df$apples<-ifelse(df$fruit=='apple' & df$dates>="2010-04-01" & df$dates<"2010-10-01",1,0)
df
names dates age sex fruit apples
1 tom 2010-02-01 60 m apple 0
2 mary 2010-05-01 55 f orange 0
3 tom 2010-03-01 60 m banana 0
4 john 2010-07-01 57 m kiwi 0
5 mary 2010-07-01 55 f apple 1
6 tom 2010-06-01 60 m apple 1
7 john 2010-09-01 57 m apple 1
8 mary 2010-07-01 55 f orange 0
9 john 2010-11-01 57 m banana 0
10 mary 2010-09-01 55 f apple 1
11 tom 2010-08-01 60 m kiwi 0
12 mary 2010-11-01 55 f apple 0
13 john 2010-12-01 57 m orange 0
14 john 2011-01-01 57 m apple 0
My problem is that Mary is in there twice. I only want the first date on which she got an apple in the specified timeframe (and everyone elses first date in the real data). I would like a second column called "apples1" which flags each persons initial date in the defined timeframe that they got an apple.
Desired output:
names dates age sex fruit apples apples1
1 tom 2010-02-01 60 m apple 0 0
2 mary 2010-05-01 55 f orange 0 0
3 tom 2010-03-01 60 m banana 0 0
4 john 2010-07-01 57 m kiwi 0 0
5 mary 2010-07-01 55 f apple 1 1
6 tom 2010-06-01 60 m apple 1 1
7 john 2010-09-01 57 m apple 1 1
8 mary 2010-07-01 55 f orange 0 0
9 john 2010-11-01 57 m banana 0 0
10 mary 2010-09-01 55 f apple 1 0
11 tom 2010-08-01 60 m kiwi 0 0
12 mary 2010-11-01 55 f apple 0 0
13 john 2010-12-01 57 m orange 0 0
14 john 2011-01-01 57 m apple 0 0
I've been searching, and the nearest thing is this - Select only the first rows for each unique value of a column in R. But this doesn't address unique ids. I've also come across !duplicated, but I don't want to remove mary's data, as I need her dates to remain to follow up on her. I'm probably missing something really fundamental here, apologies in advance.