I have a panel data set for daily revenue (and other variables) by ID, where the day with 0 revenue go unreported. I want to fill in these blanks with 0 for my analysis, meaning that for each ID's time series, I need to make sure there is an observation for each day. Each series can begin or end on a date distinct from the other series. I have been attempting to use the "padr" package, but I keep getting an "unused argument" error using the following sample code:
library(padr)
library(dplyr)
#unbalanced panel data
ID <- c(1,1,1,1,
2,2,2,2,2,2,
3,3,3,3,3,3,3,
4,4,4)
DT <- today() + c(1,3,4,5, #ID = 1
3,4,7,8,9,10, #ID = 2
2,5,6,7,8,9,10, #ID = 3
8,10,11) #ID = 4
#The end date denote the max date for each ID
EndDT <- today() + c(5,5,5,5, #ID = 1
13,13,13,13,13,13, #ID = 2
10,10,10,10,10,10,10, #ID = 3
15,15,15) #ID = 4
#random variables v1 and v2 to represent revenue and other variables
set.seed(1)
v1 <- rnorm(20,mean = 10000, sd = 5)
v2 <- rnorm(20,mean = 5000, sd = 1.5)
df <- as.data.frame(cbind(ID,DT,EndDT,v1,v2))
#format to simpler date
df$DT <- as.Date(DT, origin="1970-01-01")
df$EndDT <- as.Date(EndDT, origin="1970-01-01")
df_padded <- arrange(df,ID,DT) %>%
pad(by='DT',group='ID', end_val='EndDT') %>%
fill_by_value(v1,v2, value=0)
My error message:
Error in pad(., by = "DT", group = "ID", end_val = "EndDT") :
unused argument (group = "ID")
Answers not involving padr are also highly welcome.