2

I have a question related to filtering on dates in R. I found e.g. this link dplyr filter on Date, which answers the question how to filter with help of dplyr in a specific date range. I would like to select a dynamic range, e.g. calculate the number of critical Jobs in a specific window e.g. the last seven days starting from the current date in the dataset. The code I have in mind would look something like this:

my.data %>% 
group_by(category) %>% 
filter(date > date - days(7) & date <= date) %>% 
mutate(ncrit = sum(critical == 'yes'))

This is not working properly. Is there a way to get this running with dplyr?

Edit:

Apologies for the unclear post. To complete the post first the idea: imagine computers running jobs. If a computer fails to compute jobs the past x days it is more likely that it also fails in calculating the current job. A dummy dataset includes the computer categories (e.g. A/B), the date, and failure (yes/no)

Using the dataset from Rui Barradas, I would like to add with dplyr the following column 'number of critical Jobs in past 3 days" (in this case x = 3):

head(my.data, 7)
  category       date critical number of critical jobs in past 3 days
1        A 2018-08-14      yes                                     NA
2        A 2018-08-15       no                                     NA
3        A 2018-08-16      yes                                     NA
4        A 2018-08-17       no                                      2
5        A 2018-08-18      yes                                      1
6        A 2018-08-19       no                                      2
7        A 2018-08-20      yes                                      1

Data (Rui Barradas):

set.seed(3635)
my.data <- data.frame(category = rep(c('A', 'B'), each = 10), #
                  date = rep(seq(Sys.Date() - 9, Sys.Date(), by = 'days')),
                  critical = sample(c('no', 'yes'), 20, TRUE))
Mus
  • 7,290
  • 24
  • 86
  • 130
Ollinator
  • 65
  • 4
  • 2
    Please include sample data for `my.data`. Are entries in column `date` date objects (e.g. `POSIXct`)? Also explain why the code is not running properly. Are there errors? Warnings? Is the result not the expected result? – Maurits Evers Aug 23 '18 at 06:17
  • 2
    Note that both `date > date - days(7)` and `date <= date` are always `TRUE`. – Rui Barradas Aug 23 '18 at 06:20
  • Are you now summing the values `"no"`? And if it so, in row 3 shouldn't it be `1`? – Rui Barradas Aug 23 '18 at 10:51
  • I'm summing the values "yes". At row 4 (2018-08-17) I'm summing over the days 14, 15, 16 (2 times yes). In row 3 I added NA since I wanted to have exactly three days available but you are right one could change it to 1. – Ollinator Aug 23 '18 at 11:02
  • @Ollinator See if the edit to the answer solves it. – Rui Barradas Aug 23 '18 at 11:27
  • @Rui Barradas. First of all thx for your solution, its already very helpful. It reproduces the result stated in the question. Maybe you have some Suggestion using the date column. In the data.frame the dates are perfectly linear. In real world it might be that there are some gaps:. e.g. 2018-09-01, 2018-09-15, 2018-09-16. Nobody knows what happenened in the long shutdown between the first two dates. For this reason it should be not included in the calculation. – Ollinator Aug 23 '18 at 11:53
  • @Ollinator you are very welcome :) – Sal-laS Aug 23 '18 at 11:55

3 Answers3

1

Without an example dataset it's not very easy to say, but given your description of the problem I believe the following is on the right track.
The code uses function rollapplyr from package zoo, inspired not by the accepted but by the second answer to this question.

library(zoo)
library(dplyr)

sumCrit <- function(DF, crit = "yes", window = 3){
    DF %>%
        group_by(category) %>%
        mutate(ncrit = rollapplyr(critical == crit, list(-seq(3)), sum, fill = NA))
}


result <- sumCrit(my.data)

head(result, 7)
## A tibble: 7 x 4
## Groups:   category [1]
#  category date       critical ncrit
#  <fct>    <date>     <fct>    <int>
#1 A        2018-08-14 yes         NA
#2 A        2018-08-15 no          NA
#3 A        2018-08-16 yes         NA
#4 A        2018-08-17 no           2
#5 A        2018-08-18 yes          1
#6 A        2018-08-19 no           2
#7 A        2018-08-20 yes          1

Data.

This is a made up dataset meant to test the code above.

set.seed(3635)    # Make the results reproducible
my.data <- data.frame(category = rep(c("A", "B"), each = 10),
                      date = rep(seq(Sys.Date() - 9, Sys.Date(), by = "days"), 2),
                      critical = sample(c("no", "yes"), 20, TRUE))
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
0

Data generation

DATE1 <- as.Date("2018-08-23")
DATE2 <- as.Date("2018-07-23")

# creating a data range with the start and end date:
dates <- seq(DATE2, DATE1, by="days")

dt<-data.frame(category=sample(1:6,32,replace = T),deadline=dates)

Filter the dates

library("tidyverse")
dt %>% 
  group_by(category) %>% 
  filter(deadline %in% seq(Sys.Date()-7,Sys.Date() , by="days") )
Sal-laS
  • 11,016
  • 25
  • 99
  • 169
0

Using the dataset that Rui Barradas created, providing a lubridate formulation, using intervals

set.seed(3635)    # Make the results reproducible
my.data <- data.frame(category = rep(c("A", "B"), each = 10),
                      date = rep(seq(Sys.Date() - 9, Sys.Date(), by = "days"), 2),
                      critical = sample(c("no", "yes"), 20, TRUE))


library(lubridate) #use lubridate to create intervals
 INT_check<-interval(Sys.Date()-7,Sys.Date()) # 7 days from today
my.data %>% 
  filter(date %within% INT_check ) %>% 
  group_by(category)%>%
  summarise(ncrit = sum(critical == 'yes'))

you can also specify INT_Check as

   INT_check<-interval("2018-08-16","2018-08-18") # if you want to use absolute dates
    INT_check<-interval("2018-08-16",Sys.Date()) # if you want to specify just absolute start date
ashleych
  • 1,042
  • 8
  • 25