Select row prior to first occurrence of an event by group

Question

I have a series of observations that describe if and when an animal is spotted in a specific area. The following sample table identifies when a certain animal is seen (status == 1) or not (status == 0) by day.

   id       date status
1   1 2014-06-20      1
2   1 2014-06-21      1
3   1 2014-06-22      1
4   1 2014-06-23      1
5   1 2014-06-24      0
6   2 2014-06-20      1
7   2 2014-06-21      1
8   2 2014-06-22      0
9   2 2014-06-23      1
10  2 2014-06-24      1
11  3 2014-06-20      1
12  3 2014-06-21      1
13  3 2014-06-22      0
14  3 2014-06-23      1
15  3 2014-06-24      0
16  4 2014-06-20      1
17  4 2014-06-21      0
18  4 2014-06-22      0
19  4 2014-06-23      0
20  4 2014-06-24      1

Using the data.table package, I can identify the first day an animal is no longer seen in the area:

library(data.table)
dt <- as.data.table(df)
dt[status == 0, .SD[1], by = id]
  id       date status
1:  1 2014-06-24      0
2:  2 2014-06-22      0
3:  3 2014-06-22      0
4:  4 2014-06-21      0

While the above table is useful, I would like to know how to manipulate the function to find the dates prior to first occurrence of an animal's absence. In other words, I want to know the last day that each animal is in the area before temporarily leaving.

My actual data set bins these presence/absence observations into different time lengths depending on the situation (e.g. presence/absence by 3-hour intervals, 6-hour, etc). Therefore, it would be easier to access the previous row rather than subtract the time interval from each value since it always changes. My desired output would be the following:

  id       date status
1:  1 2014-06-23      1
2:  2 2014-06-21      1
3:  3 2014-06-21      1
4:  4 2014-06-20      1

Please feel free to use base code or other packages (i.e. dplyr) to answer this question, I am always up for something new. Thank you for your time!

score 9 · Answer 1 · edited May 23 '17 at 12:11

9

Try the following:

dt[dt[status == 0, .I[1] - 1, by = id]$V1]
#   id       date status
#1:  1 2014-06-23      1
#2:  2 2014-06-21      1
#3:  3 2014-06-21      1
#4:  4 2014-06-20      1

Incidentally, this method (using .I instead of .SD) will also be much faster. See this post for more on that.

edited May 23 '17 at 12:11

Community

1
1

answered Jun 26 '14 at 22:47

eddi

49,088
6
104
155

Thank you for the solution and reference. This has helped out immensely! – ccapizzano Jul 14 '14 at 15:58

score 4 · Answer 2 · answered Jun 26 '14 at 22:54

4

Here is a method via dplyr :

df %>% 
  group_by(id) %>%
  mutate(status_change = status - lead(status)) %>%
  filter(status_change == 1)
  id       date status status_change
1  1 2014-06-23      1             1
2  2 2014-06-21      1             1
3  3 2014-06-21      1             1
4  3 2014-06-23      1             1
5  4 2014-06-20      1             1

This takes advantage of status being a numeric variable. lead() accesses the next value; the change is 1 when and animal disappears.

answered Jun 26 '14 at 22:54

AndrewMacDonald

2,870
1
18
31

1

I haven't seen `lead` used before, very handy. To avoid adding a new column to the dataset you could do the math inside `filter`: `filter(status - lead(status) == 1)`. – aosmith Jun 26 '14 at 23:19
@AndrewMacDonald, Your solution is very helpful because it identifies all days prior to an animal disappearing, not just the first event. Thank you again! – ccapizzano Jul 14 '14 at 16:00

Select row prior to first occurrence of an event by group

2 Answers2

Linked