0

I have a very large dataframe...

v.l.df <- data.frame(seq(0, 10, 0.0001),seq(0, 10, 0.0001),seq(0, 10, 0.0001))

...and a function with some if statements and calculations...

a.f <- function(cell_value,action){
  if(action == 1){
    cell_value * 1
  }

  else if(action == 2){
    cell_value * 5
  }
}

I now want to apply this function to the first two columns of my v.l.df row by row and build the sums of the returns. The new columns should thus contain (pseudo code):

new_col_1                                    new_col_2
a.f(v.l.df[1,1],1) + a.f(v.l.df[1,2],1)      a.f(v.l.df[1,1],2) + a.f(v.l.df[1,2],2)
a.f(v.l.df[2,1],1) + a.f(v.l.df[2,2],1)      a.f(v.l.df[2,1],2) + a.f(v.l.df[2,2],2)
...

How can this be achieved? I am struggeling with the multiple arguments when using apply and the sum of the returned values form the function.

EDIT: Changed the example function. Should now return the folowing

> a.f(2,1)
[1] 2
> a.f(2,2)
[1] 10
user3347232
  • 407
  • 1
  • 7
  • 16
  • What does your function do? As it is coded, it doesn't return anything. Try to run `cell.test = 0.7` , `action = 1` and `a.f(cell.test)` should it return 100 or 70? – Bernardo Nov 12 '14 at 15:42
  • Sorry, I simplified the function in a wrong way. I have changed the function `a.f` in the example and it should work now. – user3347232 Nov 12 '14 at 16:56
  • is the first element of new_col_2 meant to be `a.f(v.l.df[1,1],2) + a.f(v.l.df[1,2],2)` ? – vpipkt Nov 12 '14 at 17:23
  • yes! I corrected the example in the question. – user3347232 Nov 13 '14 at 09:38

2 Answers2

0

I'd do this in a couple of steps. You can reduce to fewer steps, but I prefer to keep it more readable:

First, apply a.f to all cells two times, using action=1 and action=2 to the first two columns of v.1.df (to pass aditional arguments inside apply, just put them after defining FUN):

action.1 = apply(v.1.df[,1:2], c(1,2), FUN = a.f, action=1)

action.2 = apply(v.1.df[,1:2] ,c(1,2), FUN = a.f, action=2)

Then ppply rowSums to both action.1 and action.2 and store the results in the same data.frame:

v.l.df$new.1 = rowSums(action.1)         #or v.l.df$new.1 = apply(action.1,1,sum)
v.l.df$new.2 = rowSums(action.2)         #or v.l.df$new.1 = apply(action.2,1,sum)
Bernardo
  • 426
  • 3
  • 16
  • I get this error when executing action.1: `Error in if (d2 == 0L) { : missing value where TRUE/FALSE needed > traceback() 1: apply(v.l.df[, 1:2], c(1, 2), FUN = a.f, action = 1)` – user3347232 Nov 13 '14 at 09:54
  • Apparently the `MARGIN` argument is missing. A `2`should do the trick. – user3347232 Nov 13 '14 at 11:46
  • The `c(1,2)`is the `margin` argument, it just wasn't explicit, nevertheless, I didn't have the same error with my code using your example data. – Bernardo Nov 13 '14 at 18:43
0

I believe your result is achieved by:

v.l.df$new_col_1 <- a.f(v.l.df$V1, 1) + a.f(v.l.df$V2, 1)
v.l.df$new_col_2 <- a.f(v.l.df$V1, 2) + a.f(v.l.df$V2, 2)

Assuming your first two columns are named V1 and V2 respectively.

You may also define another function

a.f.2 <- function(val1, val2, method) {
    a.f(val1, method) + a.f(val2, method)
}

And apply it as follows

v.l.df$new_col_1 <- a.f.2(v.l.df$V1, v.l.df$V2, 1)
v.l.df$new_col_2 <- a.f.2(v.l.df$V1, v.l.df$V2, 2)

You can write this summary function with ... argument, to take an arbitrary number of inputs. The example below expects (and does not check for) columns of a data frame

a.f.n<- function(method,...){
    rowSums(sapply(...,a.f,method))
}

Then apply this as follows:

v.l.df$new_col_1 <- a.f.n(v.l.df[,1:1000], method=1)
v.l.df$new_col_2 <- a.f.n(v.l.df[,1:1000], method=2)

I am not sure how efficient this will be, but it is compact. :-)

vpipkt
  • 1,710
  • 14
  • 17
  • The thing is that I only posted an example. The real `v.l.df` has about 10,000 columns (instead of three) and I want to use this action for the fist 1000 or so instead of the first two. How can your answer be adapted to that fact? – user3347232 Nov 13 '14 at 09:58
  • One approach, as mentioned, is to build the second function to accept an arbitrary number of functions. This is a bit beyond my knowledge at this point. I'm looking at this: http://stackoverflow.com/questions/3057341/how-to-use-rs-ellipsis-feature-when-writing-your-own-function. – vpipkt Nov 13 '14 at 14:03