1

I have a data set as follows. I have Col1, Col2 and Col3. I need to create column 4 (Mean) which is calculated for group as the value of Col3 of that row divided by the sum of last 3 (might have to change to 4,5,6). This needs to be done for each group using col1.

   Col1   Col2 Col3      Mean
1     A Jan-13  695        NA
2     A Feb-13  993        NA
3     A Mar-13  921        NA
4     A Apr-13  595 0.6841702
5     A May-13  632 0.7556796
6     A Jun-13  582 0.8128492
7     A Jul-13  810 1.3432836
8     A Jul-13  804 1.1916996
9     A Aug-13  927 1.2663934
10    A Sep-13  500 0.5903188
11    A Oct-13  719 0.9668310
12    A Nov-13  933 1.3042870
13    A Dec-13  844 1.1765799
14    A Jan-14  942 1.1322115
15    A Feb-14  753 0.8308202
16    A Mar-14  567 0.6699488
17    A Apr-14  507 0.6724138
18    A May-14  517 0.8489327
19    B Jan-13  835        NA
20    B Feb-13  892        NA
21    B Mar-13  510        NA
22    B Apr-13  615 0.8247653
23    B May-13  501 0.7451661
24    B Jun-13  617 1.1383764
25    B Jul-13  951 1.6462781
26    B Jul-13  632 0.9163847
27    B Aug-13  885 1.2068182
28    B Sep-13  821 0.9979741
29    B Oct-13  844 1.0829769
30    B Nov-13  587 0.6905882
31    B Dec-13  663 0.8832149
32    B Jan-14  614 0.8796562
33    B Feb-14  775 1.2473176
34    B Mar-14  688 1.0058480
35    B Apr-14  963 1.3909485
36    B May-14  934 1.1549876
37    C Jan-13  567        NA
38    C Feb-13  972        NA
39    C Mar-13  618        NA
40    C Apr-13  839 1.1668985
41    C May-13  747 0.9226019
42    C Jun-13  574 0.7813067
43    C Jul-13  718 0.9972222
44    C Jul-13  704 1.0358019
45    C Aug-13  982 1.4759519
46    C Sep-13  528 0.6589018
47    C Oct-13 1000 1.3550136
48    C Nov-13  609 0.7278884
49    C Dec-13  980 1.3757604
50    C Jan-14  611 0.7079954
51    C Feb-14  604 0.8236364
52    C Mar-14  674 0.9211845
53    C Apr-14  978 1.5532028
54    C May-14  885 1.1768617
55    D Jan-13  677        NA
56    D Feb-13  620        NA
57    D Mar-13  652        NA
58    D Apr-13  934 1.4376603
59    D May-13  817 1.1110607
60    D Jun-13  665 0.8302122
61    D Jul-13  761 0.9449503
62    D Jul-13  828 1.1074454
63    D Aug-13  659 0.8771074
64    D Sep-13  574 0.7660142
65    D Oct-13  785 1.1426492
66    D Nov-13  622 0.9246779
67    D Dec-13  704 1.0661282
68    D Jan-14  945 1.3429654
69    D Feb-14  971 1.2826948
70    D Mar-14  975 1.1164122
71    D Apr-14  539 0.5593220
72    D May-14  881 1.0635815

Currently I am creating lags from 1 to 6, and then using those to create a new column. This is time consuming as I have to do this for about 20 columns.

Is there a faster/ more efficient way to do this? Preferably using data.table?

RohitP
  • 11
  • 1
  • You don't need to show us the entire data set. A few rows would suffice. – Tim Biegeleisen Nov 06 '15 at 08:12
  • i think this answer help you [1](http://stackoverflow.com/questions/25951776/moving-average-using-r-code) , [2](http://stackoverflow.com/questions/743812/calculating-moving-average-in-r), [3](http://stackoverflow.com/questions/26198551/rolling-mean-moving-average-by-group-id-with-dplyr) – Batanichek Nov 06 '15 at 08:30
  • @Tim: will keep in mind from now on, – RohitP Nov 06 '15 at 09:06
  • @Batanichek: Thanks. this was something I should have thought of. – RohitP Nov 06 '15 at 09:11

2 Answers2

3

you could try the shift function for that popose

library(data.table) ## v1.9.6+
N <- 3L ## Set whatever N you like
setDT(df1)[, Mean := Col3 / (Reduce(`+`, shift(Col3, 1:N)) / N), by = Col1]
head(df1)
#    Col1   Col2 Col3      Mean
# 1:    A Jan-13  695        NA
# 2:    A Feb-13  993        NA
# 3:    A Mar-13  921        NA
# 4:    A Apr-13  595 0.6841702
# 5:    A May-13  632 0.7556796
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
0

We could use embed with split

unsplit(lapply(split(df1$Col3, df1$Col1), function(x) 
           x/head(rowMeans(embed(c(rep(NA,3), x), 3)),-1)), df1$Col1)
akrun
  • 874,273
  • 37
  • 540
  • 662