R's padr package claiming the "datetime variable does not vary" when it does vary

Question

library(tidyverse)
library(lubridate)
library(padr)

df
#> # A tibble: 828 x 5
#>    Scar_Id      Code     Type         Value      YrMo      
#>    <chr>        <chr>    <chr>        <date>     <date>    
#>  1 0070-179     AA       Start_Date   2020-04-22 2020-04-01
#>  2 0070-179     AA       Closure_Date 2020-05-23 2020-05-01
#>  3 1139-179     AA       Start_Date   2020-04-23 2020-04-01
#>  4 1139-179     AA       Closure_Date 2020-05-23 2020-05-01
#>  5 262-179      AA       Start_Date   2019-08-29 2019-08-01
#>  6 262-179      AA       Closure_Date 2020-05-23 2020-05-01
#>  7 270-179      AA       Start_Date   2019-08-29 2019-08-01
#>  8 270-179      AA       Closure_Date 2020-05-23 2020-05-01
#>  9 476-179      BB       Start_Date   2019-09-04 2019-09-01
#> 10 476-179      BB       Closure_Date 2019-11-04 2019-11-01
#> # ... with 818 more rows

I have an R data frame named df shown above. I want to concentrate on row numbers 5 and 6. I can usually use the package padr to pad the months in between rows 5 and 6. The pad() function of the padr will basically add rows at intervals the user specifies, best shown as the added rows "X" below.

#>  1 0070-179     AA       Start_Date   2020-04-22 2020-04-01
#>  2 0070-179     AA       Closure_Date 2020-05-23 2020-05-01
#>  3 1139-179     AA       Start_Date   2020-04-23 2020-04-01
#>  4 1139-179     AA       Closure_Date 2020-05-23 2020-05-01
#>  5 262-179      AA       Start_Date   2019-08-29 2019-08-01
#>  X 262-179      NA       NA           NA         2019-09-01
#>  X 262-179      NA       NA           NA         2019-10-01
#>  X 262-179      NA       NA           NA         2019-11-01
#>  X 262-179      NA       NA           NA         2019-12-01
#>  X 262-179      NA       NA           NA         2020-01-01
#>  X 262-179      NA       NA           NA         2020-02-01
#>  X 262-179      NA       NA           NA         2020-03-01
#>  X 262-179      NA       NA           NA         2020-04-01
#>  6 262-179      AA       Closure_Date 2020-05-23 2020-05-01
#>  7 270-179      AA       Start_Date   2019-08-29 2019-08-01
#>  8 270-179      AA       Closure_Date 2020-05-23 2020-05-01
#>  9 476-179      BB       Start_Date   2019-09-04 2019-09-01
#> 10 476-179      BB       Closure_Date 2019-11-04 2019-11-01

To get there I usually issue a command, such as is shown below, and it works fine in padr. But it doesn't work in my specific example, and instead yields the warning shown below.

df %>% pad(group = "Scar_Id", by = "YrMo", interval = "month")

#> # A tibble: 828 x 5
#>    Scar_Id      Code     Type         Value      YrMo      
#>    <chr>        <chr>    <chr>        <date>     <date>    
#>  1 0070-179     AA       Start_Date   2020-04-22 2020-04-01
#>  2 0070-179     AA       Closure_Date 2020-05-23 2020-05-01
#>  3 1139-179     AA       Start_Date   2020-04-23 2020-04-01
#>  4 1139-179     AA       Closure_Date 2020-05-23 2020-05-01
#>  5 262-179      AA       Start_Date   2019-08-29 2019-08-01
#>  6 262-179      AA       Closure_Date 2020-05-23 2020-05-01
#>  7 270-179      AA       Start_Date   2019-08-29 2019-08-01
#>  8 270-179      AA       Closure_Date 2020-05-23 2020-05-01
#>  9 476-179      BB       Start_Date   2019-09-04 2019-09-01
#> 10 476-179      BB       Closure_Date 2019-11-04 2019-11-01
#> # ... with 818 more rows
#> Warning message:
#> datetime variable does not vary for 537 of the groups, no padding applied on this / these group(s)

Why does it claim that "the datetime variable does not vary" for rows 5 and 6, when the datetime does indeed vary. The datetime for row 5 variable YrMo is "2019-08-01" and the datetime for row 6 variable YrMo is "2020-05-01". Let me state the obvious that "2019-08-01" varies from "2020-05-01".

Any ideas what went wrong? I tried to create a reproducible example and could not. The basic examples I created all work as expected (as I describe). Hopefully these clues can help somebody determine what is going on.

What happens if you use `df[5:6,]` instead of `df` to see if you still get the unexpected behaviour. If you do, then you can make it a reprex by doing `dput(df[5:6,])`, and if it doesn't then you have another clue to help solve the puzzle. — Allan Cameron, Apr 25 '20 at 00:39
I ended up using `df %>% group_by(Scar_Id) %>% pad(by = "YrMo", interval = "month")` and the problem went away. — Display name, Apr 28 '20 at 13:29

R's padr package claiming the "datetime variable does not vary" when it does vary

0 Answers0

Linked