0

I'm trying to run a rolling regression with dplyr. I'm using rollapplyr from package zoo and lsfit as I'm only interested in the beta of the regression. Here's what I've tried:

library(dplyr); library(zoo)

df1 = expand.grid(site = seq(10),
                    year = 2000:2004,
                    day = 1:50)

df1 %>%
group_by(year) %>%
mutate(beta1 = rollapplyr(data = site,
                            width = 5,
                            FUN = lsfit,
                            x=day))

I'm getting this error: Error: not all arguments have the same length

I think rollapplyr accepts non-zoo objects but I may be wrong. It could also be that the piping (%>%) does not play well with rollapplyr as it requires a data object in the function.

Any idea?

EDIT My question is different from: rolling regression with dplyr I want to use pipes in order to use group_by

Community
  • 1
  • 1
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56

1 Answers1

3

The function will not cycle through multiple vectors. The sliced site vector is being compared to the full vector day. We can write our own rolling apply function with Map to go through groups of our vector:

rollapplydf <- function(xx, width) {
  l <- length(xx)
  sq <- Map(':', 1:(l-width+1), width:l)
  lst <- lapply(sq, function(i) lm(xx[i] ~ seq(length(xx[i])))$coeff[2] )
  do.call('rbind', c(rep(NA, width-1L), lst))
}

So we can add this to the pipe:

library(dplyr)
df1 %>% 
  group_by(year) %>% 
  mutate(beta1 = rollapplydf(xx = site, width = 5) )

# Source: local data frame [2,500 x 4]
# Groups: year [5]
# 
#     site  year   day beta1
#    (int) (int) (int) (dbl)
# 1      1  2000     1    NA
# 2      2  2000     1    NA
# 3      3  2000     1    NA
# 4      4  2000     1    NA
# 5      5  2000     1     1
# 6      6  2000     1     1
# 7      7  2000     1     1
# 8      8  2000     1     1
# 9      9  2000     1     1
# 10    10  2000     1     1
# ..   ...   ...   ...   ...
Jaap
  • 81,064
  • 34
  • 182
  • 193
Pierre L
  • 28,203
  • 6
  • 47
  • 69