I see this comes up in python, panda, and other areas, but I see no help with dplyr for this problem.
require(dplyr)
set.seed(1234L)
i <- 1:80 # years 1 thru 80
S1 <- .02*i # a trend line
S2 <- S1[i] + rnorm(n=80,mean=0,sd=.2) # same line w/ error added
x <- tibble(S1,
S2,
yrs10 = rep(1:8,each=10), # to aggregate by decades (8 groups)
yrs16 = rep(1:5,each = 16), # by 16 yr spans (5 groups)
yrs20 = rep(1:4,each = 20), # by 20 yr spans (4 groups)
yrs40 = rep(1:2,each = 40) # by 40 yr spans (2 groups)
)
This is my problem: I cannot find a way to feed repeated aggregations into the group_by statement without declaring 4 different vars and repeating all the code.
x10 <- x %>%
group_by(yrs10) %>%
summarize(cor=cor(S1,S2)^2) # squared to show prop of var accounted for
x16 <- x %>%
group_by(yrs16) %>%
summarize(cor=cor(S1,S2)^2) # squared to show prop of var accounted for
x20 <- x %>%
group_by(yrs20) %>%
summarize(cor=cor(S1,S2)^2) # squared to show prop of var accounted for
x40 <-x %>%
group_by(yrs40) %>%
summarize(cor=cor(S1,S2)^2) # squared to show prop of var accounted for
x10
x16
x20
x40
There should be a way to feed yrs10 thru yrs40 as a list and collect these results as a list but all I get are errors when I try. The dplyr answers I see here don't seem to cover this scenario. What am I missing?