1

I am trying to write a code that creates a new column with moving averages based on 'year' where the number of rows for each year is variable where each year only has one unique value being repeated for each row of that year. I want to calculate moving averages based these unique values independent of the number of rows per year.

Just a FYI; I'm very new to R and programming so if I missed something for you to understand my problem please let me know.

For example, the type of data I'm working with looks like this:

df <- data.frame(year = c(1702, 1702, 1702, 1702,   1702,   1703,   1703,   1703,   1704,   1704,   1705,   1705,   1705,   1705,   1706,   1706,   1707,   1707,   1707, 1708, 1708,   1708,   1708,   1708,   1709,   1709,   1709,   1709,   1709), avgtemp = c(5.3, 5.3,    5.3,    5.3,    5.3,    3.9,    3.9,    3.9,    6.12,   6.12,   4.16,   4.16,   4.16,   4.16,   5.65,   5.65,   3.11,   3.11,   3.11, 5.17, 5.17,   5.17,   5.17,   5.17,   4.75,   4.75,   4.75,   4.75,   4.75))

I found this post, Moving Average by Unique Date with multiple observations per date, and tried the solution offered there by Mark Peterson but it doesnt work for me.

I've tried the following code.

rolledavg <-
  df %>%
  group_by(year) %>%
  summarise(rollavg = mean(avgtemp)) %>%
  ungroup() %>%
  arrange(year) %>%
  mutate( ma3temp = rollapply(rollavg
                              , 3
                              , mean
                              , align= "right"
                              , partial=T
                              , fill = NA))

I get the following error: "Error in order(year) : argument 1 is not a vector".

The expected output should be something like this:

expected output df

I would appreciate any help I can get. Don't mind working with other packages/solutions than the one offered above.

kassler
  • 11
  • 3
  • Can you please show the expected output? – tmfmnk Nov 05 '19 at 13:07
  • Sorry, I realized I had not properly formulated my original question, but your question made it clear to me and I've edited the post accordingly. See expected output pic. – kassler Nov 05 '19 at 13:47

1 Answers1

1

Something like this with sapply()?

dat$ra <- sapply(1:nrow(dat), function(n) mean(dat$avgtemp[1:n]))
#    year avgtemp       ra
# 1  1702    5.30 5.300000
# 2  1702    5.30 5.300000
# 3  1702    5.30 5.300000
# 4  1702    5.30 5.300000
# 5  1702    5.30 5.300000
# 6  1703    3.90 5.066667
# 7  1703    3.90 4.900000
# 8  1703    3.90 4.775000
# 9  1704    6.12 4.924444
# 10 1704    6.12 5.044000
# 11 1705    4.16 4.963636
# 12 1705    4.16 4.896667
# 13 1705    4.16 4.840000
# 14 1705    4.16 4.791429
# 15 1706    5.65 4.848667
# 16 1706    5.65 4.898750
# 17 1707    3.11 4.793529
# 18 1707    3.11 4.700000
# 19 1707    3.11 4.616316

Note: If you want just two digits, use round(mean(.), 2).

Update

Following the update of your question, you may calculate the moving average with filter()1 from a unique version of your data frame and merge the result with the original data frame.

dat <- merge(dat, transform(unique(dat), ra=filter(avgtemp, rep(1/3, 3), sides=1)))
#    year avgtemp       ra
# 1  1702    5.30       NA
# 2  1702    5.30       NA
# 3  1702    5.30       NA
# 4  1702    5.30       NA
# 5  1702    5.30       NA
# 6  1703    3.90       NA
# 7  1703    3.90       NA
# 8  1703    3.90       NA
# 9  1704    6.12 5.106667
# 10 1704    6.12 5.106667
# 11 1705    4.16 4.726667
# 12 1705    4.16 4.726667
# 13 1705    4.16 4.726667
# 14 1705    4.16 4.726667
# 15 1706    5.65 5.310000
# 16 1706    5.65 5.310000
# 17 1707    3.11 4.306667
# 18 1707    3.11 4.306667
# 19 1707    3.11 4.306667

This is also possible with the zoo::rollmean() function.

dat <- merge(dat, transform(unique(dat), ra=c(rep(NA, 3 - 1), zoo::rollmean(avgtemp, 3))))

Data

dat <- structure(list(year = c(1702, 1702, 1702, 1702, 1702, 1703, 1703, 
1703, 1704, 1704, 1705, 1705, 1705, 1705, 1706, 1706, 1707, 1707, 
1707), avgtemp = c(5.3, 5.3, 5.3, 5.3, 5.3, 3.9, 3.9, 3.9, 6.12, 
6.12, 4.16, 4.16, 4.16, 4.16, 5.65, 5.65, 3.11, 3.11, 3.11)), row.names = c(NA, 
-19L), class = "data.frame")
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Sorry I didn't properly specify my question in the original post, I've updated it now to more clearly show what I'm asking. Thanks anyway for the quick answer :) – kassler Nov 05 '19 at 13:50