I have a dataset similar to the following:
dt = structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3,
4, 5, 5, 6, 6, 6, 6), date = structure(c(1332288000, 1332288000,
1360540800, 1384819200, 1384819200, 1325548800, 1326499200, 1365292800,
1365292800, 1365292800, 1400284800, 1442966400, 1450051200, 1404864000,
1330387200, 1330387200, 1366329600, 1366329600, 1412467200, 1412467200
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), type = c("A",
"C", "B", "A", "B", "C", "C", "A", "B", "C", "C", "A", "A", "C",
"C", "C", "C", "B", "B", "A")), row.names = c(NA, -20L), class = c("tbl_df",
"tbl", "data.frame"))
I have rows to document unique individuals (ID) and dates they appear in a system (date) for specific types of events (type). The rows have been ordered first by ID then by date. You can see that an individual can appear on multiple dates and have multiple event types within each date.
I am trying to create an additional column (first) that indicates/flags the first date an individual appears, marking "1" for every row that corresponds to their first appearance date, not just the first row they appear in. This is what I am after:
ID date type first
1: 1 2012-03-21 A 1
2: 1 2012-03-21 C 1
3: 1 2013-02-11 B 0
4: 1 2013-11-19 A 0
5: 1 2013-11-19 B 0
6: 2 2012-01-03 C 1
7: 2 2012-01-14 C 0
8: 2 2013-04-07 A 0
9: 2 2013-04-07 B 0
10: 2 2013-04-07 C 0
11: 2 2014-05-17 C 0
12: 3 2015-09-23 A 1
13: 3 2015-12-14 A 0
14: 4 2014-07-09 C 1
15: 5 2012-02-28 C 1
16: 5 2012-02-28 C 1
17: 6 2013-04-19 C 1
18: 6 2013-04-19 B 1
19: 6 2014-10-05 B 0
20: 6 2014-10-05 A 0
I have seen solutions to identifying first appearances/rows here and here, for example. But these are not what I am after, since I am grouping by both an ID and date. I have attempted to use the duplicated function inside data.table while grouping by ID and date, but this is identifying the unique combinations of ID and date:
df[!duplicated(df, by=c("ID", "date")), first := 1]
Any help would be greatly appreciated - especially solutions using data.table or base r.
Thanks in advance