1

I am trying to compute the monthly average from daily data for each group within the data. In this example, there are 3 groups each with daily observations from 2011-01-01 to 2011-03-31

times <- c(31,28,31)
date = rep(seq(as.Date('2011-01-01'),as.Date('2011-03-31'),by = 1),
       times=3)
id = rep(rep(1001:1003, times), each=3)
val = rnorm(length(id), mean=5, sd=2)
df <- data.frame(date, id, val)

> head(df)
  date   id      val
1 2011-01-01 1001 6.341471
2 2011-01-02 1001 4.353585
3 2011-01-03 1001 8.131239
4 2011-01-04 1001 3.761434
5 2011-01-05 1001 6.344846
6 2011-01-06 1001 7.068889   


> tail(df)
      date   id      val
265 2011-03-26 1003 5.644132
266 2011-03-27 1003 4.949719
267 2011-03-28 1003 4.490786
268 2011-03-29 1003 1.739529
269 2011-03-30 1003 2.250610
270 2011-03-31 1003 1.853057

The desired output should be something like this with the computed monthly values:

monthYear  id    monthlyValue
2011-01    1001  ?
2011-02    1001  ?
2011-03    1001  ?
....       ....  ..
2011-01    1003  ?
2011-02    1003  ?
2011-03    1003  ?
mpap
  • 123
  • 1
  • 7
  • 1
    Right now, `df$val` is populated with identical values. If you assign the random values to `df$val` outside the `data.frame()` function call (`e.g. df$val <- rnorm(...)`), this will solve this issue – 12b345b6b78 Nov 27 '18 at 21:49

1 Answers1

2
> output <- aggregate(df$val, list(format(df$date, "%Y-%m"), df$id), mean)
> colnames(output) <- c('monthYear', 'id', 'monthlyValue')
> print(output)
  monthYear   id monthlyValue
1   2011-01 1001     5.368910
2   2011-02 1001     4.701553
3   2011-03 1001     5.225284
4   2011-01 1002     5.117631
5   2011-02 1002     4.869240
6   2011-03 1002     4.595431
7   2011-01 1003     5.336175
8   2011-02 1003     5.438803
9   2011-03 1003     4.658504
12b345b6b78
  • 995
  • 5
  • 16