1

I am not totally sure if this is a bug or am I actually doing something wrong. But I will ask the question here and go from there.

Suppose we have a dummy data set of number of calls:

df_calls = data.frame(Call_date= c("2019-02-18",
                                    "2019-02-19",                                               
                                    "2019-02-20",                                               
                                    "2019-02-22",                                              
                                    "2019-02-25",                                              
                                    "2019-02-26",                                              
                                    "2019-03-01",                                              
                                    "2019-03-04"),
                      Calls = c(12,4,2,8,1,3,1,8))

I want to now thicken this data set and see how many calls were there per week starting from "2019-02-18".

Thus we have:

starting_day= as.Date("2019-02-18")

Now I would expect in the week starting from day "2019-02-18" until day "2019-02-24" there should be 12+4+2+8 = 26 calls.

Let's have a look...

Using the padr library, I use the function thicken:

library(padr)

df_calls_weekly = df_calls %>%
                   mutate(Call_date = as.Date(Call_date)) %>% 
                   thicken("week",colname = "Date_Week" ,start_val = starting_day) %>%
                   group_by(Date_Week) %>%  
                   summarise(Num_calls = sum(Calls)) %>%
                   ungroup()

Looking at df_calls_weekly we have the following output:

    # A tibble: 3 x 2
      Date_Week  Num_calls
      <date>         <dbl>
    1 2019-02-18        14
    2 2019-02-25         5
    3 2019-03-04         8

We seem to get a different answer, that is for the week starting from "2019-02-18" we get 14 calls. Not 26?

Upon investigating, I see that when looking at how thicken creates the dataframe, it appears to drop the row where Call_date == "2019-02-18". Which you can see explicitly here:

df_calls_weekly = df_calls %>%
  mutate(Call_date = as.Date(Call_date)) %>% 
  thicken("week",colname = "Date_Week" ,start_val = starting_day) %>%
  filter(Date_Week ==starting_day )

with output:

   Call_date Calls  Date_Week
1 2019-02-19     4 2019-02-18
2 2019-02-20     2 2019-02-18
3 2019-02-22     8 2019-02-18

it for whatever reason is ignoring the Call_date of "2019-02-18". I presume this has got to do with something with the fact that the call date is the same as the start_val date specified in the thicken function.

If anyone knows how to get the thicken to include any dates which happen to be the same as the starting date parameter in thicken, I would be very appreciative.

1 Answers1

2

The start_val argument to thicken is:

By default the first instance of interval that is lower than the lowest value of the input datetime variable, with all time units on default value.

The function assuming that all other values are strictly greater than start_val, but you have the lowest values being equal to start_val and those lowest values are thereby ignored.

Here's a fix:

df_calls = data.frame(Call_date= c("2019-02-18",
                                   "2019-02-19",                                               
                                   "2019-02-20",                                               
                                   "2019-02-22",                                              
                                   "2019-02-25",                                              
                                   "2019-02-26",                                              
                                   "2019-03-01",                                              
                                   "2019-03-04"),
                      Calls = c(12,4,2,8,1,3,1,8))

starting_day= as.POSIXct("2019-02-17 23:59:59") # a second before the minimum date
library(tidyverse)
library(padr)

df_calls_weekly = df_calls %>%
  mutate(Call_date = as.Date(Call_date)) %>% 
  thicken("week",colname = "Date_Week", start_val = starting_day) %>% 
  group_by(Date_Week) %>%  
  summarise(Num_calls = sum(Calls)) %>%
  ungroup() %>%
  mutate(Date_Week = Date_Week + 1) # add the missing second back in
CPhil
  • 917
  • 5
  • 11