1

I have a data set with various identifying fields (lets call them f1 and f2), the date (split into month and year), and a numerical field (lets call it counts). I have aggregated the data by month and year, and now want to add to each row 12 fields showing the total counts for each of the preceding 12 months where the identifying fields match. To make that task slightly easier, I add a field showing the months since the start of 2014.

Data.Grouped <- arrange(Data, f1, f2, Year, Month) %>%>
      group_by(f1, f2, Year, Month) %>%
      summarize(total = sum(counts)) %>%
      as.data.frame() %>%
      mutate(Age.Since.2014 = (Year - 2014)*12 + Month)

To make things computationally more efficient, I am going to check only the previous row. This works if I do something like this

Data.Grouped.Expanded <- mutate(Data.Grouped, PriorMonth= lag(total, default = 0)

However, there are many instances where the previous row does not contain the previous months data for the same identifying fields (because the data has many holes). My intuition would be to do a simple conditional.

Data.Grouped.Expanded <- mutate(Data.Grouped, PriorMonth = if(
    lag(f1) == f1 &&
    lag(f2) == f2 &&
    lag(Age.Since.2014) == (Age.Since.2014 - 1))
    {lag(total, default = 0)}

Warning messages suggest that it is trying to do the conditional test across the entire lag vector at once and using only the first row's conclusion for all rows. Switching into rowwise doesn't seem to help. A workaround seems to be multiplying out the conditionals rather than using if.

    Data.Grouped.Expanded <- mutate(Data.Grouped, PriorMonth = 
    (lag(f1) == f1) *
    (lag(f2) == f2) *
    (lag(Age.Since.2014) == (Age.Since.2014 - 1)) *
    lag(total, default = 0)

This still causes warning messages to appear (In ==.default(c(0L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, :longer object length is not a multiple of shorter object length)

However the results seem to be accurate.

My questions are as follows:

  1. What exactly is the error message saying?

  2. Why does the program understand my rowwise intention when multiplying out the conditionals but not when using it in an if statement

  3. How can I wrap this logic into a function that refers to lags such as:

    PriorMonth.N = function(n) 
        (lag(f1, n) == f1) *
        (lag(f2, n) == f2) *
        (lag(Age.Since.2014, n) == (Age.Since.2014 - 1)) *
        lag(total, n, default = 0)
    

For the purposes of saying

mutate(Data.Grouped, 
    PriorMonth.One = PriorMonth.N(1),
    PriorMonth.Two = PriorMonth.N(2),
    PriorMonth.Three = PriorMonth.N(3))

and so forth

dchu58
  • 23
  • 4
  • 1
    One, consider a smaller [minimal question](http://stackoverflow.com/help/mcve) and [reproducible question](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Two, `mutate` expects vectors as long as the number of rows, so you need to research and understand the clear differences between `if` and (say) `ifelse` (although the latter has baggage of its own). – r2evans Oct 17 '16 at 05:38

0 Answers0