R data.table calculations with previous row dependencies

Question

Below is some data which I currently calculate in Excel.

col_A   col _B  col_C col_D col_E   col_F    col_G
-1.5%   0.010   1.00    1   1.00     -       -   
-5.4%   0.024   1.00    1   1.00     0.01   -0.00 
-7.9%   0.036   1.00    1   1.00     0.02   -0.00 
-12.7%  0.052   0.99    1   0.99     0.06   -0.01 
-4.6%   0.049   0.98    1   0.98     0.19   -0.01 
-8.3%   0.051   0.95    1   0.95     0.39   -0.03 
-7.3%   0.052   0.88    1   0.88     1.00   -0.07 
-9.2%   0.055   0.69    1   0.69     2.31   -0.21 
-7.9%   0.055   0.38    1   0.38     5.63   -0.44 
-2.2%   0.051   0.29    1   0.29     11.13  -0.24

I have been trying to perform the calculations in R using data.table. The problem I have is that data.table performs calculation column-wise. I need the calculations to be performed row-wise, because of dependencies on the results of previous row values. The Excel-formulas for the calculated columns are given below, with "T" indicating "current row" and "T-1" indication "previous row"

col_C: (col_C.T-1) * (1 + col_G.T)

col_D: max (Col_C.T, col_D.T-1)

col_E: (col_C.T / col_D.T)

col_F: max ((1 - (col_C.T-1 / col_D.T-1)) / col B.T-1), 0.01)

col_G: col_A * col_F

Any assistance is greatly appreciated.

I assume you're looking for `data.table::shift` to lag/lead entries? — Maurits Evers, Feb 02 '19 at 12:47
I have found some success using data.table::shift. However, the calculations of each column is dependant on the results of the previous row, which is not properly captured. — Sarel Louw, Feb 02 '19 at 12:54
(1) `shift` allows you to calculate values based on previous *row* entries, so I would imagine `shift` is the way to go here; (2) however I do struggle to make sense of your "rules", which look cyclic to me. For example entries in column C are based on column G, which in turn depends on column F, which depends on column C again. — Maurits Evers, Feb 02 '19 at 13:09
Yes it is the dependencies that are giving a headache. Note that the it jumps from T to T-1, for example when F goes back to C. — Sarel Louw, Feb 02 '19 at 14:09
Yes I understand that, but your problem statement is not very clear (to me); we still need initial values for certain columns. For instance, are the first row values for columns C, D and E always `1.0`? — Maurits Evers, Feb 02 '19 at 15:41
The starting values are effectively 1 for those columns yes. — Sarel Louw, Feb 03 '19 at 05:33
Please, [edit] your question and re-post the Excel data with more decimal places. The limited precision of only 2 decimals may not be sufficient to reproduce the expected result. Thank you. — Uwe, May 29 '19 at 05:53

Maurits Evers · Answer 1 · 2019-02-03T06:15:10.513

0

Ok, so this is not an answer but too long for a comment.

Please double-check your rules! They are not consistent with the input and expected output.

To demonstrate, let's take rows 2 and 3

col_A   col _B  col_C col_D col_E   col_F    col_G
...
-5.4%   0.024   1.00    1   1.00     0.01   -0.00 
-7.9%   0.036   1.00    1   1.00     0.02   -0.00

and calculate col_F value for row i = 3 according to your rules:

col_F[i] = max((1 - col_C[i - 1] / col_D[i - 1]) / col_B[i - 1], 0.01)
         = max((1 - 1 / 1) / 0.024, 0.01)
         = max(0, 0.01)
         = 0.01

So the value in row 3 for col_F should be 0.01 instead of 0.02.

There are possibilities:

Your rules are incorrect, or
your starting values of entries in col_C, col_D and col_E for row 1 are incorrect.

Either way, at the moment input data, expected output and rules do not agree.

edited Feb 03 '19 at 06:15

answered Feb 03 '19 at 06:00

Maurits Evers

49,617
4
47
68

To me, the rules are fine. The discrepancies you have observed are due to the limited precision of the values posted. I entered the formulas in Excel to double check. The value in row 3 of `col_C` is computed as `0.99946` which is printed as `1.00` with 2 digits precision. – Uwe May 28 '19 at 15:55
@Uwe *"The discrepancies you have observed are due to the limited precision of the values posted"* Hmm, really? All I've done is used the numbers provided by OP. He doesn't mention anywhere that numbers are rounded, nor does he seem to provide access to raw data anywhere. I don't work with Excel, so I'm not sure how/what you checked. I don't see where my example above fails. – Maurits Evers May 29 '19 at 03:38

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

If there are no other conditions which require to use data.table I suggest to implement the rowwise calculations using a matrix:

m <- data.matrix(dt)
m[, 3:7] <- NA

for (i in seq.int(nrow(m))) {
  if (i == 1L) {
    m[i, "col_F"] <- 0
    m[i, "col_G"] <- 0 
    m[i, "col_C"] <- 1
    m[i, "col_D"] <- 1
  } else {
    m[i, "col_F"] <- max((1 - (m[i-1, "col_C"] / m[i-1, "col_D"])) / m[i-1, "col_B"], 0.01)
    m[i, "col_G"] <- m[i, "col_A"] * m[i, "col_F"]
    m[i, "col_C"] <- m[i-1, "col_C"] * (1 + m[i, "col_G"])
    m[i, "col_D"] <- max(m[i, "col_C"], m[i-1, "col_D"])
  }
m[i, "col_E"] <- m[i, "col_C"] / m[i, "col_D"]  
}

m

       col_A col_B     col_C col_D     col_E       col_F        col_G
 [1,] -0.015 0.010 1.0000000     1 1.0000000  0.00000000  0.000000000
 [2,] -0.054 0.024 0.9994600     1 0.9994600  0.01000000 -0.000540000
 [3,] -0.079 0.036 0.9976835     1 0.9976835  0.02250000 -0.001777500
 [4,] -0.127 0.052 0.9895302     1 0.9895302  0.06434834 -0.008172239
 [5,] -0.046 0.049 0.9803653     1 0.9803653  0.20134322 -0.009261788
 [6,] -0.083 0.051 0.9477596     1 0.9477596  0.40070748 -0.033258721
 [7,] -0.073 0.052 0.8768905     1 0.8768905  1.02432085 -0.074775422
 [8,] -0.092 0.055 0.6858958     1 0.6858958  2.36749020 -0.217809099
 [9,] -0.079 0.055 0.3764416     1 0.3764416  5.71098585 -0.451167882
[10,] -0.022 0.051 0.2825483     1 0.2825483 11.33742486 -0.249423347

The deviations in the last 4 rows of col_F from OP's expected result might be due to the limited precision of the posted values of col_A and col_B.

Data

library(data.table)

dt <- fread("col_A   col_B  col_C col_D col_E   col_F    col_G
-1.5%   0.010   1.00    1   1.00     -       -   
-5.4%   0.024   1.00    1   1.00     0.01   -0.00 
-7.9%   0.036   1.00    1   1.00     0.02   -0.00 
-12.7%  0.052   0.99    1   0.99     0.06   -0.01 
-4.6%   0.049   0.98    1   0.98     0.19   -0.01 
-8.3%   0.051   0.95    1   0.95     0.39   -0.03 
-7.3%   0.052   0.88    1   0.88     1.00   -0.07 
-9.2%   0.055   0.69    1   0.69     2.31   -0.21 
-7.9%   0.055   0.38    1   0.38     5.63   -0.44 
-2.2%   0.051   0.29    1   0.29     11.13  -0.24 ", na.strings = "-")
# convert percent string to numeric
dt[, col_A := readr::parse_number(col_A) / 100]

R data.table calculations with previous row dependencies

2 Answers2

Data

Linked