I am looking to make rolling counts for multiple sites of instances that a threshold is exceeded.
A simplified version of my data:
Dates SiteID Value
1 2015-04-01 A 9.1
2 2015-04-02 A 8.8
3 2015-04-02 A 7.9
4 2015-04-03 A 9.2
5 2015-04-08 A 9.3
6 2015-04-11 A 8.9
7 2015-04-11 A 9.2
8 2015-04-13 A 9.1
9 2015-04-16 A 7.8
10 2015-04-01 B 9.1
11 2015-04-01 B 8.8
12 2015-04-04 B 9.9
13 2015-04-05 B 7.8
14 2015-04-06 B 9.8
15 2015-04-06 B 9.2
16 2015-04-07 B 9.1
17 2015-04-08 B 8.5
18 2015-04-15 B 9.1
If the rolling period is 3 days and the threshold for 'Value' is 9, I am looking for a new column, 'Exceedances', that counts the number of times 'Value' was greater than 9 in the last 3 days at a given 'SiteID'. So this would look like:
Dates SiteID Value Exceedances
1 2015-04-01 A 9.1 1
2 2015-04-02 A 8.8 1
3 2015-04-02 A 7.9 1
4 2015-04-03 A 9.2 2
5 2015-04-08 A 9.3 1
6 2015-04-11 A 8.9 0
7 2015-04-11 A 9.2 1
8 2015-04-13 A 9.1 2
9 2015-04-16 A 7.8 0
10 2015-04-01 B 9.1 1
11 2015-04-01 B 8.8 1
12 2015-04-04 B 9.9 1
13 2015-04-05 B 7.8 1
14 2015-04-06 B 9.8 2
15 2015-04-06 B 9.2 3
16 2015-04-07 B 9.1 3
17 2015-04-08 B 8.5 3
18 2015-04-15 B 9.1 1
DT = structure(list(r = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), Dates = structure(c(16526, 16527,
16527, 16528, 16533, 16536, 16536, 16538, 16541, 16526, 16526,
16529, 16530, 16531, 16531, 16532, 16533, 16540), class = "Date"),
SiteID = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B", "B", "B", "B"), Value = c(9.1, 8.8,
7.9, 9.2, 9.3, 8.9, 9.2, 9.1, 7.8, 9.1, 8.8, 9.9, 7.8, 9.8,
9.2, 9.1, 8.5, 9.1), Exceedances = c(1L, 1L, 1L, 2L, 1L,
0L, 1L, 2L, 0L, 1L, 1L, 1L, 1L, 2L, 3L, 3L, 3L, 1L)), .Names = c("r",
"Dates", "SiteID", "Value", "Exceedances"), row.names = c(NA,
-18L), class = "data.frame")
I have seen similar questions that use data.table and deplyr but none have addressed counting exceedances of thresholds.
Ultimately this will be applied to very large datasets so methods that will be fastest are appreciated. And in case this makes a difference on recommendations, I will also be applying this for a rolling year rather than the 3 day example above, and the dataset will contain 'NA's.