Code Building Process and Embedded Functions

Question

First this question isn't about trying to solve a specific problem. As newcomer to R I'm also working to create more efficient code and code building procedures. Getting perspectives on different programming methods and even styles is the reason behind this question.

Below are three ways to code something:

First here is the example data:

stackexample <- c(52,50,45,49.5,50.5,12,10,14,11.5,12,110,108,106,101,104)
dim(stackexample)<- c(5,3)

Method One: Do the math in the function without defining any objects

 ertimesIVCV1 <- function (x) 
{ (solve(var(log((x[-nrow(x),])/(x[-1,])))))%*%
  ((1+(log(x[1,]/(x)[nrow(x),])))^(1/nrow(x))-1)}

ertimesIVCV1(stackexample)

Method Two: Define Objects in the function and then manipulate those objects

    ertimesIVCV2 <- function (x) 
{ IVCV <- solve(var(log((x[-nrow(x),])/(x[-1,]))));
  retsexcess <- (1+(log(x[1,]/(x)[nrow(x),])))^(1/nrow(x))-1;
  IVCV%*%retsexcess}

ertimesIVCV2(stackexample)

Method Three: Define Several Functions and call those functions in "summary like" function

IVCV <- function (x) {solve(var(log((x[-nrow(x),])/(x[-1,]))))}
retsexcess <- function(x) (1+(log(x[1,]/(x)[nrow(x),])))^(1/nrow(x))-1
ertimesIVCV3 <- function (x) {IVCV(x)%*%retsexcess(x)}

ertimesIVCV3(stackexample)

So all produce the same answer:

           [,1]
[1,]  1.4430104
[2,] -0.1365155
[3,] 11.8088378

but as you can see three different approaches.

Is there such a thing as an optimal number of embedded functions or should we always try to explicitly list all the math out? How many levels of functions within functions is optimal? Is either method superior in computational speed? Is there a rule of thumb to this? How do you approach this? Any comments or suggestions or links would be welcome and thank you!

Rye

See http://stackoverflow.com/q/4406873/210673, especially @GavinSimpson's answer. — Aaron left Stack Overflow, Feb 28 '12 at 19:41

score 6 · Answer 1 · answered Feb 28 '12 at 23:04

If the goal is time-efficiency then the answer with the offered examples is "who cares?". The overhead of the function calls is not what determines efficiency. You should probably be focusing on other issues, such as user understanding and ability to maintain code.

 require(rbenchmark)
 benchmark(replications=100, ver1= ertimesIVCV1(stackexample),
 ver2=ertimesIVCV2(stackexample),
 ver3 = ertimesIVCV3(stackexample) )
# ------------------
  test replications elapsed relative user.self sys.self user.child sys.child
1 ver1          100   0.030 1.000000      0.03        0          0         0
2 ver2          100   0.030 1.000000      0.03        0          0         0
3 ver3          100   0.031 1.033333      0.03        0          0         0

flodel · Accepted Answer · 2012-02-29T03:23:09.610

IMHO, speed efficiency should be the last of your concerns when writing code, especially if you are a beginner. Instead, your primary focus should be about simplicity, readability, modularity. Don't read me wrong, efficiency is a great thing, and you'll find many ways to make your code faster when needed, but it should not be a priority by itself.

So I'll be giving tips about style mostly. To illustrate, here is what my version of your code would look like. Please bear in mind that I do not know what your code is computing so I did my best in trying to break it using meaningful variable names.

IVCV <- function(stack) {

## This function computes [...] IVCV stands for [...]
## Inputs:
##    - stack: a matrix where each column [...]
## Output: a matrix [...]

   n <- nrow(stack) # stack size
   stack.ratios  <- stack[-n, ] / stack[-1, ]
   log.ratios    <- log(stack.ratios)
   ivcv          <- solve(var(log.ratios))

   return(ivcv)
}

ExcessReturn <- function(stack) {

## This function computes [...] IVCV stands for [...]
## Inputs:
##    - stack: a matrix where each column [...]
## Output: a matrix [...]

   n <- nrow(stack) # stack size
   total.ratio   <- stack[1, ] / stack[n, ]
   excess.return <- (1 + log(total.ratio)) ^ (1 / n) - 1

   return(excess.return)
}

ExcessReturnTimesIVCV <- function(stack) {

## This function computes [...] IVCV stands for [...]
## Inputs:
##    - stack: a matrix where each column [...]
## Output: a vector [...]

    return(IVCV(stack) %*% ExcessReturn(stack))
}

1) yes, break your code into small functions. It is better for readability, flexibility, and maintenance. It also makes unit testing easier, where you can design tests for each elementary piece of code.

2) document a function by including comments about its description/inputs/output inside the body of the function. This way, after the function is created, the user can see its description as part of the function's printout (e.g., just type ExcessReturnTimesIVCV in the GUI).

3) break out complexity into multiple statements. Right now, all of your three suggestions are hard to understand, with too many things going on on each line. A statement should do a simple thing so it can read easily. Creating more objects is unlikely to slow down your process, and it will make debugging much easier.

4) your object names are key to making your code clear. Choose them well and use a consistent syntax. I use UpperCamelCase for my own functions' names, and lowercase words separated with dots for most other objects.

5) put comments, especially where 3) and 4) are not enough to make the code clear. In my example, I chose to use a variable n. I went against the recommendation that variable names should be descriptive, but it was to make the code a little lighter and give expressions like stack[-n, ] / stack[-1, ] some nice symmetry. Since n is a bad name, I put a comment explaining its meaning. I might also have put more comments in the code if I knew what the functions were really doing.

6) Use consistent syntax rules, mostly to improve readability. You'll hear different opinions about what should be used here. In general, there is not one best approach. The most important thing is to make a choice and stick with it. So here are my suggestions:

a) one statement per line, no semi colons.

b) consistent spacing and indentation (no tabs). I put spaces after commas, around binary operators. I also use extra spacing to line up things if it helps readability.

c) consistent bracing : be careful of the way you are using curly brackets to define blocks, otherwise you are likely to get problems in script mode. See Section 8.1.43 of the R Inferno (a great reference.)

Good luck!

score 4 · Answer 3 · answered Feb 29 '12 at 00:13

Disagree with DWin (though not really, I'm just putting a different spin on it). If the goal is your time-efficiency, then we've got some cases. If you're doing something once, then I agree with "who cares?". Do whatever you want / whatever you think of at the time, probably method 1 or 2.

The advantage of Method 3 is in repeatability. If you're typing the same code out more than a couple times, your efficiency is down. Put it in a function, and save yourself the typing and especially the possibility of mistyping. I see that you're already talking about putting things in a function, but would your IVCV function come in handy as a utility or for other functions? If not, don't bother with it.

The bigger a project gets, the better it becomes to break it into pieces that get their own function. This can make organization, debugging, and modification go much more smoothly.

Code Building Process and Embedded Functions

3 Answers3