2

I have a dataframe that roughly looks likes this (also dput at the end of the question):

     dates    var1    var2    var3
1997-01-15       0    -0.5    -1.0
1997-01-17       0   -0.42   -0.85
1997-02-03    0.23       0       0
1997-02-09    0.46       0       0

I need to aggregate these data by month. But I also need the monthly frequencies of these data. Now, the raw monthly aggregation is not a problem, e.g.:

myFrame$month <- as.Date(cut(frame$dates, breaks = "month"))
as.data.frame(aggregate(var1 ~ month, frame, mean))

and so on for all the variables (although I'm not sure if that's the most efficient way, since I have to do this for every variable separately) – this gives me a frame that looks like this:

     month    var1    var2    var3
1997-01-01       0   -0.46  -0.925
1997-02-01   0.345       0       0

BUT, since I also need the monthly frequencies of all the variables, I would need a dataframe that looks like this:

     month    var1    var2    var3    freq_v1    freq_v2    freq_v3
1997-01-01       0   -0.46  -0.925          0          2          2
1997-02-01   0.345       0       0          2          0          0

And this is what I'm not sure how to do. Thanks!

DPUT: dput(frame) structure(list(dates = structure(c(9876, 9877, 9878, 9879, 9880, 9881, 9882, 9883, 9884, 9885, 9886, 9887, 9888, 9889, 9890, 9891, 9892, 9893, 9894, 9895, 9896, 9897, 9898, 9899, 9900, 9901, 9902 ), class = "Date"), var1 = c(0, -0.461538461538462, 0, -0.384615384615385, 0, -0.307692307692308, 0, -0.230769230769231, 0, -0.153846153846154, 0, -0.0769230769230769, 0, 0, 0, 0.076923076923077, 0, 0.153846153846154, 0, 0.230769230769231, 0, 0.307692307692308, 0, 0.384615384615385, 0, 0.461538461538462, 0), var2 = c(-0.5, 0, -0.423076923076923, 0, -0.346153846153846, 0, -0.269230769230769, 0, -0.192307692307692, 0, -0.115384615384615, 0, -0.0384615384615384, 0, 0.0384615384615385, 0, 0.115384615384615, 0, 0.192307692307692, 0, 0.269230769230769, 0, 0.346153846153846, 0, 0.423076923076923, 0, 0.5), var3 = c(-1, 0, -0.846153846153846, 0, -0.692307692307692, 0, -0.538461538461538, 0, -0.384615384615385, 0, -0.230769230769231, 0, -0.0769230769230769, 0, 0.0769230769230771, 0, 0.230769230769231, 0, 0.384615384615385, 0, 0.538461538461539, 0, 0.692307692307693, 0, 0.846153846153846, 0, 1), month = structure(c(9862, 9862, 9862, 9862, 9862, 9862, 9862, 9862, 9862, 9862, 9862, 9862, 9862, 9862, 9862, 9862, 9862, 9893, 9893, 9893, 9893, 9893, 9893, 9893, 9893, 9893, 9893), class = "Date")), .Names = c("dates", "var1", "var2", "var3", "month"), row.names = c(NA, -27L), class = "data.frame")

Zlo
  • 1,150
  • 2
  • 18
  • 38
  • 1
    It is not clear about the frequency. perhaps `library(dplyr);frame %>% group_by(month = as.Date(cut(dates, breaks = "month"))) %>% summarise_each(funs(mean, sum(.!=0)), var1:var3)` or `frame %>% group_by(month = as.Date(cut(dates, breaks = "month"))) %>% summarise_each(funs(mean(.[.!=0]), sum(.!=0)), var1:var3)` – akrun Jun 20 '16 at 09:44

2 Answers2

1

You can calculate means and frequencies at that level of aggregation using the library dplyr.

library(lubridate)
library(dplyr)

myFrame$month <- as.Date(cut(myFrame$dates, breaks = "month"))

myFrame_means <- myFrame %>%
                select(-dates) %>%
                group_by(month) %>%
                summarise_each(funs(mean))

myFrame_freq <- myFrame %>%
                    group_by(month) %>%
                    summarise(freq_v1 = sum(var1 !=0),
                              freq_v2 = sum(var2 !=0),
                              freq_v3 = sum(var3 !=0))

cbind(myFrame_means, myFrame_freq[2:4])

The last command outputs:

       month        var1       var2       var3 freq_v1 freq_v2 freq_v3
1 1997-01-01 -0.09049774 -0.1018100 -0.2036199       7       9       9
2 1997-02-01  0.15384615  0.1730769  0.3461538       5       5       5
thepule
  • 1,721
  • 1
  • 12
  • 22
0

Could this be what you're looking for?

library(dplyr)
x %>%
  group_by(month) %>%
  summarise_each(funs(mean, length), var1, var2, var3)

with output:

Source: local data frame [2 x 7]

   month   var1_mean  var2_mean  var3_mean var1_length var2_length var3_length
  (date)       (dbl)      (dbl)      (dbl)       (int)       (int)       (int)
1 1997-01-01 -0.09049774 -0.1018100 -0.2036199          17          17          17
2 1997-02-01  0.15384615  0.1730769  0.3461538          10          10          10
Yun Ching
  • 665
  • 7
  • 10