3

I have a series of observations that describe if and when an animal is spotted in a specific area. The following sample table identifies when a certain animal is seen (status == 1) or not (status == 0) by day.

   id       date status
1   1 2014-06-20      1
2   1 2014-06-21      1
3   1 2014-06-22      1
4   1 2014-06-23      1
5   1 2014-06-24      0
6   2 2014-06-20      1
7   2 2014-06-21      1
8   2 2014-06-22      0
9   2 2014-06-23      1
10  2 2014-06-24      1
11  3 2014-06-20      1
12  3 2014-06-21      1
13  3 2014-06-22      0
14  3 2014-06-23      1
15  3 2014-06-24      0
16  4 2014-06-20      1
17  4 2014-06-21      0
18  4 2014-06-22      0
19  4 2014-06-23      0
20  4 2014-06-24      1

Using the data.table package, I can identify the first day an animal is no longer seen in the area:

library(data.table)
dt <- as.data.table(df)
dt[status == 0, .SD[1], by = id]
  id       date status
1:  1 2014-06-24      0
2:  2 2014-06-22      0
3:  3 2014-06-22      0
4:  4 2014-06-21      0

While the above table is useful, I would like to know how to manipulate the function to find the dates prior to first occurrence of an animal's absence. In other words, I want to know the last day that each animal is in the area before temporarily leaving.

My actual data set bins these presence/absence observations into different time lengths depending on the situation (e.g. presence/absence by 3-hour intervals, 6-hour, etc). Therefore, it would be easier to access the previous row rather than subtract the time interval from each value since it always changes. My desired output would be the following:

  id       date status
1:  1 2014-06-23      1
2:  2 2014-06-21      1
3:  3 2014-06-21      1
4:  4 2014-06-20      1

Please feel free to use base code or other packages (i.e. dplyr) to answer this question, I am always up for something new. Thank you for your time!

ccapizzano
  • 1,556
  • 13
  • 20

2 Answers2

9

Try the following:

dt[dt[status == 0, .I[1] - 1, by = id]$V1]
#   id       date status
#1:  1 2014-06-23      1
#2:  2 2014-06-21      1
#3:  3 2014-06-21      1
#4:  4 2014-06-20      1

Incidentally, this method (using .I instead of .SD) will also be much faster. See this post for more on that.

Community
  • 1
  • 1
eddi
  • 49,088
  • 6
  • 104
  • 155
4

Here is a method via dplyr :

df %>% 
  group_by(id) %>%
  mutate(status_change = status - lead(status)) %>%
  filter(status_change == 1)
  id       date status status_change
1  1 2014-06-23      1             1
2  2 2014-06-21      1             1
3  3 2014-06-21      1             1
4  3 2014-06-23      1             1
5  4 2014-06-20      1             1

This takes advantage of status being a numeric variable. lead() accesses the next value; the change is 1 when and animal disappears.

AndrewMacDonald
  • 2,870
  • 1
  • 18
  • 31
  • 1
    I haven't seen `lead` used before, very handy. To avoid adding a new column to the dataset you could do the math inside `filter`: `filter(status - lead(status) == 1)`. – aosmith Jun 26 '14 at 23:19
  • @AndrewMacDonald, Your solution is very helpful because it identifies all days prior to an animal disappearing, not just the first event. Thank you again! – ccapizzano Jul 14 '14 at 16:00