1

I'm using R to investigate how the return affects a person's pension account. In order to do this I'm calculating the development of the pension account from age 25 until retirement at age 70 for 1000 different return scenarios. I'm using the variables expenses(e), monthly deposit(m), return in percent(r), account balance (y) and return in euros (x). They are all stored in data frames with the dimensions 46x1000.

I have succesfully managed to calculate it using a for loop. However this is very slow, and since i'm doing a lot of these i am wondering if someone have an idea to speed up the code. I have tried with apply functions and with vectorization but cannot get it to work. My problem is that i have to calculate the numbers for year i before calculating the numbers for year i+1. I have searched the internet for a solution, but have a hard time finding answers which apply for my specific problem. I should note that I'm still pretty new to R.

A have written a simplified version of the code im using:

for (i in 3:46) {
x[i-1,]<-(y[i-1,]+m[i-1,]*6-0.5*e[i-1,])*r[i-1,]
y[i,]<-y[i-1,]+x[i-1,]-e[i-1,]+m[i-1,]*12 
}

I hope someone is able to help, and thanks in advance.

Best regards Rasmus

Rasmus
  • 43
  • 3
  • 1
    You could use the `RCpp` package and write your calculations in `C++`. This way you are guaranteed to have a good performance and your code seems pretty easy to migrate. – Gregor de Cillia Jun 29 '17 at 09:51
  • 1
    Check out this: https://stackoverflow.com/questions/2908822/speed-up-the-loop-operation-in-r/8474941#8474941. Both the question and the answers are very good. – p0bs Jun 29 '17 at 09:53

1 Answers1

4

Your process looks to me like it needs the loop, since each iteration depends on the one before it. As @Gregor de Cillia mentions in the comments, you could do this in C++ for a speed improvement.

First, set up some data.

set.seed(1)
e <- matrix( data = rnorm( n = 46000, mean = 1000, sd = 200 ),
                         nrow = 46,
                         ncol = 1000 )
m <- matrix( data = rnorm( n = 46000, mean = 2000, sd = 200 ),
                         nrow = 46,
                         ncol = 1000 )
r <- matrix( data = rnorm( n = 46000, mean = 4, sd = 0.5 ),
                         nrow = 46,
                         ncol = 1000 )
x <- matrix( data = NA_real_, nrow = 45, ncol = 1000 )
y <- matrix( data = NA_real_, nrow = 46, ncol = 1000 )
y[1,] <- rnorm( n = 1000, 10000, 1000 )

Then define a C++ function in an Rcpp file. This returns a list with your two matrices x and y as list items:

List pension( NumericMatrix e,
              NumericMatrix m,
              NumericMatrix r,
              NumericVector yfirstrow ) {

    int ncols = e.cols();
    int nrows = e.rows();

    NumericMatrix x( nrows - 1, ncols );
    NumericMatrix y( nrows, ncols );

    y( 0, _ ) = yfirstrow;

    for( int i = 1; i < nrows; i++ ) {
        x( i-1, _ ) = ( y( i-1, _ ) + m( i-1, _ ) * 6 - 0.5 * e( i-1, _ ) ) * r( i-1, _ );
        y( i, _ ) = y( i-1, _ ) + x( i-1, _ ) - e( i-1, _ ) + m( i-1, _ )* 12;
    };

    List ret;
    ret["x"] = x;
    ret["y"] = y;

    return ret;

}

Compare the two methods for speed.

microbenchmark::microbenchmark(
    R = {
        for (i in 2:46) {
            x[i-1,] <- unlist( (y[i-1,] + m[i-1,]*6 - 0.5*e[i-1,] ) * r[i-1,] )
            y[i,]<- unlist( y[i-1,]+x[i-1,]-e[i-1,]+m[i-1,]*12 )
        }
    },
    cpp = {
        cppList <- pension( e, m, r, y[1,] )
    },
    times = 100
)

Make sure the outputs match:

> identical( x, cppList$x )
[1] TRUE
> identical( y, cppList$y )
[1] TRUE

The speed test results:

Unit: microseconds
 expr      min       lq     mean   median       uq       max neval
    R 3309.962 3986.569 6961.838 5244.479 6219.215 96576.592   100
  cpp  879.713  992.229 1266.014 1124.345 1273.691  3041.966   100

So the Rcpp solution is around 5x faster here, but to be honest, the R loop you've made isn't too shabby for the dataset you're working with (with only 45 iterations, the overhead of the R loop isn't too much of a hindrance). If you really need the speed, c++ can help.

rosscova
  • 5,430
  • 1
  • 22
  • 35
  • Thanks you so much for the response, i will take a look at it and hopefully it will solve my issue. I know the loop in my example do not take that much time to run, but i have more complex loops in my code which have loops like this in them, and once you have to run the loop in my example many times it takes quite some time. – Rasmus Jun 29 '17 at 10:58
  • Thanks for turning my comment into an answer :). Just a remark: columnwise calculation (`y[,i]<- unlist(...`) might be slightly better due to the internal storage model of `R`. In this testcase, the difference was not measurable though. – Gregor de Cillia Jun 29 '17 at 11:07
  • @GregordeCillia sorry if you were intending on answering (hard to detect sarcasm or lack thereof in text).\n I didn't know there was a different internal storage method for row-wise vs column-wise calculations? Do you just mean for data.frames (ie: columns as list items) or matrices as well? – rosscova Jun 29 '17 at 11:12
  • There was no sarcasm in my comment. When I worked in C++ I found out that the memory locations of columns were connected while the ones of rows were not. This boostet performance by 1-2% in some very extreme cases. In R the same is true. I just tested it on matrices – Gregor de Cillia Jun 29 '17 at 11:19