I have a data set with various identifying fields (lets call them f1 and f2), the date (split into month and year), and a numerical field (lets call it counts). I have aggregated the data by month and year, and now want to add to each row 12 fields showing the total counts for each of the preceding 12 months where the identifying fields match. To make that task slightly easier, I add a field showing the months since the start of 2014.
Data.Grouped <- arrange(Data, f1, f2, Year, Month) %>%>
group_by(f1, f2, Year, Month) %>%
summarize(total = sum(counts)) %>%
as.data.frame() %>%
mutate(Age.Since.2014 = (Year - 2014)*12 + Month)
To make things computationally more efficient, I am going to check only the previous row. This works if I do something like this
Data.Grouped.Expanded <- mutate(Data.Grouped, PriorMonth= lag(total, default = 0)
However, there are many instances where the previous row does not contain the previous months data for the same identifying fields (because the data has many holes). My intuition would be to do a simple conditional.
Data.Grouped.Expanded <- mutate(Data.Grouped, PriorMonth = if(
lag(f1) == f1 &&
lag(f2) == f2 &&
lag(Age.Since.2014) == (Age.Since.2014 - 1))
{lag(total, default = 0)}
Warning messages suggest that it is trying to do the conditional test across the entire lag vector at once and using only the first row's conclusion for all rows. Switching into rowwise doesn't seem to help. A workaround seems to be multiplying out the conditionals rather than using if.
Data.Grouped.Expanded <- mutate(Data.Grouped, PriorMonth =
(lag(f1) == f1) *
(lag(f2) == f2) *
(lag(Age.Since.2014) == (Age.Since.2014 - 1)) *
lag(total, default = 0)
This still causes warning messages to appear
(In ==.default(c(0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, :longer object length is not a multiple of shorter object length)
However the results seem to be accurate.
My questions are as follows:
What exactly is the error message saying?
Why does the program understand my rowwise intention when multiplying out the conditionals but not when using it in an if statement
How can I wrap this logic into a function that refers to lags such as:
PriorMonth.N = function(n) (lag(f1, n) == f1) * (lag(f2, n) == f2) * (lag(Age.Since.2014, n) == (Age.Since.2014 - 1)) * lag(total, n, default = 0)
For the purposes of saying
mutate(Data.Grouped,
PriorMonth.One = PriorMonth.N(1),
PriorMonth.Two = PriorMonth.N(2),
PriorMonth.Three = PriorMonth.N(3))
and so forth