I would like to build a new column in a data.frame myDF
which is the value returned for each row by a function getval
taking the elements in this row as arguments. getval
also uses an external vector v1
as argument. For example:
myn = 1000
a = seq(0, 1, length.out = myn)
b = seq(-1, 1, length.out = myn)
myDF = expand.grid(a=a, b=b)
set.seed(13)
v1 = rnorm(100)
getval = function(a, b, v) {
return(sum(a*v + b/2*v))
}
myDF$val = apply(myDF, 1, function(x) {getval(a=x[1], b=x[2], v=v1)})
head(myDF)
# a b val
# 1 0.000000000 -1 3.091267
# 2 0.001001001 -1 3.085078
# 3 0.002002002 -1 3.078889
# 4 0.003003003 -1 3.072700
# 5 0.004004004 -1 3.066512
# 6 0.005005005 -1 3.060323
But this is too slow (here ~4 seconds, but increasing a lot for higher myn
).
I am looking for the fastest way to implement this - Contest! ;-)
All solutions (incl. parallelizing?) and packages (dplyr
, data.table
?) are welcome - I really need something as fast as possible for myn
= 5000 for example.
EDIT
Actually, getval
is not so (easily?) vectorizable...
getval = function(a, b, v) {
return(sum(a/(a/v +1) + b/(b+2) * v))
}
myDF$val = apply(myDF, 1, function(x) {getval(a=x[1], b=x[2], v=v1)})
head(myDF)
# a b val
# 1 0.000000000 -1 6.182533
# 2 0.001001001 -1 6.282782
# 3 0.002002002 -1 6.383424
# 4 0.003003003 -1 6.484682
# 5 0.004004004 -1 6.586980
# 6 0.005005005 -1 6.691260