1

I have a data.table that looks like this:

DT <- data.table(A=1:20, B=1:20*10, C=1:20*100)
DT
    A  B   C
1:  1  10  100
2:  2  20  200
3:  3  30  300
4:  4  40  400
5:  5  50  500
...
20: 20 200 2000

I want to be able to calculate a new column "D" that has the first value as the average of the first 20 rows in column B as the first value, and then I want to use the first row of column D to help calculate the next row value of D.

Say the Average of the first 20 rows of column B is 105. and the formula for the next row in column D is this : DT$D[1]+DT$C[2] where I take the previous row value of D and add the row value of C. The third row will then look like this: DT$D[2]+DT$C[3]

    A  B   C    D
1:  1   10   100     105
2:  2   20   200     305
3:  3   30   300     605
4:  4   40   400    1005
5:  5   50   500    1505
...
20: 20  200  2000  21005

Any ideas on this would be made?

I think shift would be a great help to lag, but dont know how to get rid of the NA that it produces at the first instance?

KhalidN
  • 385
  • 1
  • 7
  • 13

1 Answers1

3

We can take the mean of the first 20 rows of column B and add the cumulative sum of C. The cumulative sum has one special consideration that we want to add a concatenation of 0 and column C without the first value.

DT[, D := mean(B[1:20]) + cumsum(c(0, C[-1]))][]
#      A   B    C     D
#  1:  1  10  100   105
#  2:  2  20  200   305
#  3:  3  30  300   605
#  4:  4  40  400  1005
#  5:  5  50  500  1505
#  6:  6  60  600  2105
#  7:  7  70  700  2805
#  8:  8  80  800  3605
#  9:  9  90  900  4505
# 10: 10 100 1000  5505
# 11: 11 110 1100  6605
# 12: 12 120 1200  7805
# 13: 13 130 1300  9105
# 14: 14 140 1400 10505
# 15: 15 150 1500 12005
# 16: 16 160 1600 13605
# 17: 17 170 1700 15305
# 18: 18 180 1800 17105
# 19: 19 190 1900 19005
# 20: 20 200 2000 21005
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Pierre L
  • 28,203
  • 6
  • 47
  • 69
  • This is excellent, thanks! What happens if instead of adding C, I want to multiply it with a number? – KhalidN Oct 28 '15 at 13:13
  • Thanks @DavidArenburg. The use of shift is good, but I'm not getting the same sum in the new column – Pierre L Oct 28 '15 at 13:13
  • @KhalidN do you have a column of numbers like `C` or a single value? – Pierre L Oct 28 '15 at 13:17
  • Ok, last attempt, `DT[, D := mean(B[1:20]) + cumsum(c(0L, shift(C, 1L, type = "lead")[-.N]))]` which is kind of silly if we can just do `C[-1]` :)) – David Arenburg Oct 28 '15 at 13:28
  • @PierreLafortune it is a single value, not a column of numbers – KhalidN Oct 28 '15 at 13:32
  • @PierreLafortune I got it working with this formula: DT[, D := mean(DT[1:20,B]) + c(0,2*DT[-1,C])][], where 2 can be replaced with whatever – KhalidN Oct 28 '15 at 13:40
  • And see the edit where `DT[-1, C]` is replaced with `C[-1]`. – Pierre L Oct 28 '15 at 13:41
  • Actually, I have discovered a misunderstanding, the first row value should not be added to every single next row, but only the next, and the value from this row, should be transported to the next one. This is evident when using this formula, where the previous value should be added. I will try and make a new question so its easier to exemplify – KhalidN Oct 28 '15 at 13:59
  • @PierreLafortune Here is the updated question http://stackoverflow.com/questions/33394070/use-previous-calculated-row-value-in-r-continued – KhalidN Oct 28 '15 at 14:40