1) Using the data defined in the Note at the end use rollapply
as shown below. nms
is the names of the columns to perform the rolling window calculation over or it could be specified as just the column indexes (i.e. nms <- 4:5
). Sum
is like sum except that it will return NA, instead of 0, if given a series which is entirely NA and otherwise it performs sum(X, na.rm = TRUE)
. Note that the NA values added in roll
are so that the series is not shorter than the window width.
library(data.table)
library(zoo)
k <- 2 # prior two months
Sum <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
roll <- function(x) rollapply(c(x, rep(NA, k)), list(1:k), Sum)
nms <- names(mytable)[4:5]
mytable[, (nms) := lapply(.SD, roll), .SDcols = nms, by = "Company"]
giving:
> mytable
Month Year Company ProducedCereals CommercialsShown
1: 6 2016 Kellog 15 19
2: 5 2016 Kellog 12 4
3: 4 2016 Kellog NA NA
4: 6 2016 General Mills 7 19
5: 5 2016 General Mills NA NA
1a) In a comment the situation is mentiond where there are missing rows and only the most recent two calendar months prior to the current row are to be used so fewer than 2 rows might be used in any sum.
It will be convenient in this case to sort the data frame first in order of Company and then date in ascending order which implies that we want right alignment rather than left in rollapply
.
We pass a zoo object with yearmon index to rollapply
so that we have a time index that Sum
can check to subset the input to the desired window. We use a window size of 3 and only sum the values in the window whose times lie within specified bounds. We will specify coredata = FALSE
to rollapply
in order that the data and index be passed to the rollapply
function and not just the data.
k <- 2 # prior 2 months
# inputs zoo object x, subsets it to specified window and sums
Sum2 <- function(x) {
w <- window(x, start = end(x) - k/12, end = end(x) - 1/12)
if (length(w) == 0 || all(is.na(w))) NA_real_ else sum(w, na.rm = TRUE)
}
nms <- names(mytable)[4:5]
setkey(mytable, Company, Year, Month) # sort
# create zoo object from arguments and run rollapplyr using Sum2
roll2 <- function(x, year, month) {
z <- zoo(x, as.yearmon(year + (month - 1)/12))
coredata(rollapplyr(z, k+1, Sum2, coredata = FALSE, partial = TRUE))
}
mytable[, (nms) := lapply(.SD, roll2, Year, Month), .SDcols = nms, by = "Company"]
giving:
> mytable
Month Year Company ProducedCereals CommercialsShown
1: 5 2016 General Mills NA NA
2: 6 2016 General Mills 7 19
3: 4 2016 Kellog NA NA
4: 5 2016 Kellog 12 4
5: 6 2016 Kellog 15
1b) Another approach to missing rows is to conver the data to long form and then to a rectangular form filling in missing cells with NA. That will work as long as the same month and year is not missing in every company.
k <- 2 # sum over k prior months
m <- melt(mytable, id = 1:3)
dd <- as.data.frame.table(tapply(m$value, m[, 1:4, with = FALSE], c),
responseName = "value")
Sum1 <- function(x) {
x <- head(x, -1)
if (length(x) == 0 || all(is.na(x))) NA_real_ else sum(x, na.rm = TRUE)
}
setDT(dd)[, value := rollapplyr(value, k+1, Sum1, partial = TRUE),
by = .(Company, variable)]
dc <- as.data.table(dcast(... ~ variable, data = dd, value = "value"))
setkey(dc, Company, Year, Month)
dc
giving:
Month Year Company ProducedCereals CommercialsShown
1: 4 2016 General Mills NA NA
2: 5 2016 General Mills NA NA
3: 6 2016 General Mills 7 19
4: 4 2016 Kellog NA NA
5: 5 2016 Kellog 12 4
6: 6 2016 Kellog 15 19
2) Another possibility is to convert mytable
to the zoo object z
splitting mytable
by Company and then use rollapply
on that. mytable
is again as shown in the Note at the end. Sum
is from (1).
k <- 2 # prior 2 months
ym <- function(m, y) as.yearmon(paste(m, y), format = "%m %Y")
z <- read.zoo(mytable, index = 1:2, split = k+1, FUN = ym)
Sum <- function(x) if (all(is.na(x))) NA else sum(x, na.rm = TRUE)
rollapply(z, list(-1:-k), Sum, partial = TRUE, fill = NA)
giving:
ProducedCereals.General Mills CommercialsShown.General Mills
Apr 2016 NA NA
May 2016 NA NA
Jun 2016 7 19
ProducedCereals.Kellog CommercialsShown.Kellog
Apr 2016 NA NA
May 2016 12 4
Jun 2016 15 19
Note: The code in the question does not generate the data displayed in the question so we used this instead for the data.table mytable
:
library(data.table)
mytable <-
structure(list(Month = c(6, 5, 4, 6, 5), Year = c(2016, 2016,
2016, 2016, 2016), Company = c("Kellog", "Kellog", "Kellog",
"General Mills", "General Mills"), ProducedCereals = c(6, 3,
12, 5, 7), CommercialsShown = c(12, 15, 4, 20, 19)), .Names = c("Month",
"Year", "Company", "ProducedCereals", "CommercialsShown"), row.names = c(NA,
-5L), class = "data.frame")
mytable <- as.data.table(mytable)