2

I have a daily data for 15 years from 1993 to 2008. I would like to compute the daily average, for the variable Open in the file, for each day of the year, based on a 31 day Window centred on the day of interest. Thus, 15⨯31 = 465 dates contribute to the statistics of one day.

Output is just 365 values out of the 15 years

The file can be downloaded from here: http://chart.yahoo.com/table.csv?s=sbux&a=2&b=01&c=1993&d=2&e=01&f=2008&g=d&q=q&y=0&z=sbux&x=.csv and can be read:

 df = read.csv("C:/data/table.csv", header = TRUE, stringsAsFactors = FALSE)
bic ton
  • 1,284
  • 1
  • 9
  • 16
  • 1
    Did you know, you can also read from web `read.csv('http://chart.yahoo.com/table.csv?s=sbux&a=2&b=01&c=1993&d=2&e=01&f=2008&g=d&q=q&y=0&z=sbux&x=.csv', header = TRUE, stringsAsFactors = FALSE)` – danas.zuokas Dec 29 '15 at 10:22
  • The data is already daily..you need a moving average? I'd suggest taking a moving average of 31 days with xts library and lagging it by 16 days. – R.S. Dec 29 '15 at 10:25
  • The [function ddays in the lubridate package](https://cran.r-project.org/web/packages/lubridate/lubridate.pdf) creates time spans that deal with issues such as the 29 february. – Paul Rougieux Dec 29 '15 at 13:32
  • 1
    @Roman Luštrik I don't think this question is a duplicate. [That question](http://stackoverflow.com/questions/743812/calculating-moving-average-in-r) doesn't refer to the specific issues of averaging over a time span measured in days and over several years at once. – Paul Rougieux Dec 29 '15 at 13:36
  • 1
    @Roman The question you marked as duplicate doesn't handle missing values. Another reason to reopen this question. ̀zoo::rollapply` does handle missing values [according to this question](http://stackoverflow.com/questions/17765001/using-rollmean-when-there-are-missing-values-na) – Paul Rougieux Dec 29 '15 at 15:00
  • Thanks @Paul4forest for this +1 – bic ton Dec 29 '15 at 15:29
  • 1
    I have posted an answer here: http://stackoverflow.com/questions/34514683/how-to-compute-daily-average-over-31-days-for-15-years-taking-into-account-miss/34514701#34514701 – Paul Rougieux Dec 29 '15 at 16:07

1 Answers1

1
library(data.table)

df <- as.data.table(read.csv('http://chart.yahoo.com/table.csv?s=sbux&a=2&b=01&c=1993&d=2&e=01&f=2008&g=d&q=q&y=0&z=sbux&x=.csv',header = TRUE, stringsAsFactors = FALSE))

df[, MAOpen := filter(Open, rep(1, 31) / 31)]
df[, Date := as.POSIXct(Date)]
df[, YDay := yday(Date)]

df[, mean(MAOpen), by = YDay]
danas.zuokas
  • 4,551
  • 4
  • 29
  • 39
  • 1
    This is the average per calendar day, the question is asking for the average based on a 31day window around the respective day. – mts Dec 29 '15 at 10:28
  • 1
    For a date 2008-02-07 you take the average of 31 days from 2008-01-16 to 2008-02-29 - 18.76548. You take all 38th year days (all seventh of February). For 2007-02-07 you get moving average 33.78355 and so on. 30.06504 is the average of all twelve moving averages. – danas.zuokas Dec 29 '15 at 11:02
  • All right I see it. My answer takes 31 observations not days. – danas.zuokas Dec 29 '15 at 12:27
  • 1
    You can add NA values for missing days by merging with this calendar data frame containing all days: `calendar <- data.frame(date = ymd("1993-03-01")+ddays(1:(15*365+3)))`. Use `merge()` or `dplyr::left-join()` to merge by date with the original data frame . – Paul Rougieux Dec 30 '15 at 07:30