0

I have a function in R which takes a scalar and a vector as arguments, to perform some operation on them returning a single value.

Given a "series" of scalars (here, the vector mya) and a "series" of vectors (here, the matrix myv), how can I vectorize the call to myf so that each element in mya goes with the corresponding vector in myv?

mya = 1:3
myv = matrix(1:30, 10, 3)

myf = function(a, v) {
  return(sum(a / (a/v + 1)))
}

sapply(1:3, function(x) {myf(mya[x], myv[,x])})
# [1]  7.980123 17.649590 26.809440

So above I would like to avoid the looping sapply operation to do directly something like:

myf(mya, myv)
# [1] 49.37443   <- Here I would like 3 values

The big issue here is performance: in my real situation, mya and myv would have more than 10e6 values or vectors respectively, and myf is much more complex.

ztl
  • 2,512
  • 1
  • 26
  • 40

1 Answers1

2

Up front, your myv might be organized as a series of vectors, one column each; it is better for many tools to convert it into a list of vectors.

asplit(myv, 2)
# [[1]]
#  [1]  1  2  3  4  5  6  7  8  9 10
# [[2]]
#  [1] 11 12 13 14 15 16 17 18 19 20
# [[3]]
#  [1] 21 22 23 24 25 26 27 28 29 30

base R

sapply/lapply are to a single vector/list as mapply/Map are to n of them.

Map(myf, mya, asplit(myv , 2))
# [[1]]
# [1] 7.980123
# [[2]]
# [1] 17.64959
# [[3]]
# [1] 26.80944
mapply(myf, mya, asplit(myv , 2))
# [1]  7.980123 17.649590 26.809440

tidyverse

The order of arguments is different, and instead of individual arguments it needs all of them in a list itself.

purrr::pmap(list(mya, asplit(myv , 2)), myf)
# [[1]]
# [1] 7.980123
# [[2]]
# [1] 17.64959
# [[3]]
# [1] 26.80944
purrr::pmap_dbl(list(mya, asplit(myv , 2)), myf)
# [1]  7.980123 17.649590 26.809440

Alternative approach, given the comments.

This approach truly is vectorized, but has deconstructed the function a little.

colSums(t(mya / (mya / t(myv) + 1)))
# [1]  7.980123 17.649590 26.809440

To get to this point, one needs to recognize where transpose and such is necessary. I'll start with some known points:

mya[1] / myv[,1] + 1
#  [1] 2.000000 1.500000 1.333333 1.250000 1.200000 1.166667 1.142857 1.125000 1.111111 1.100000

In order to mimic that with matrices (and not just vectors), we might try

(mya / myv + 1)
#           [,1]     [,2]     [,3]
#  [1,] 2.000000 1.181818 1.142857
#  [2,] 2.000000 1.250000 1.045455
#  [3,] 2.000000 1.076923 1.086957
#  [4,] 1.250000 1.142857 1.125000
#  [5,] 1.400000 1.200000 1.040000
#  [6,] 1.500000 1.062500 1.076923
#  [7,] 1.142857 1.117647 1.111111
#  [8,] 1.250000 1.166667 1.035714
#  [9,] 1.333333 1.052632 1.068966
# [10,] 1.100000 1.100000 1.100000

But if you notice, the division of mya over myv is column-wise, so it is expanding to

c(mya[1] / myv[1,1], mya[2] / myv[2,1], mya[3] / myv[3,1], mya[1] / myv[4,1], ...)

where we would prefer it to be transposed. Okay, so we transpose it so that the rows of myv are vertical for the division.

(mya / t(myv) + 1)[1,]
#  [1] 2.000000 1.500000 1.333333 1.250000 1.200000 1.166667 1.142857 1.125000 1.111111 1.100000

That's better. Now we need to do the same for the next step. That brings us to

t(mya / (mya / t(myv) + 1))
#            [,1]     [,2]     [,3]
#  [1,] 0.5000000 1.692308 2.625000
#  [2,] 0.6666667 1.714286 2.640000
#  [3,] 0.7500000 1.733333 2.653846
#  [4,] 0.8000000 1.750000 2.666667
#  [5,] 0.8333333 1.764706 2.678571
#  [6,] 0.8571429 1.777778 2.689655
#  [7,] 0.8750000 1.789474 2.700000
#  [8,] 0.8888889 1.800000 2.709677
#  [9,] 0.9000000 1.809524 2.718750
# [10,] 0.9090909 1.818182 2.727273

Since you wanted to sum across each of the mya values. Knowing that we have three in mya and we see three columns, one might infer we need to sum each column. We can prove that empirically:

sum(mya[1] / (mya[1] / myv[,1] + 1))
# [1] 7.980123
colSums(t(mya / (mya / t(myv) + 1)))
# [1]  7.980123 17.649590 26.809440

But really, we don't need to tranpose then sum columns when we can not-transpose and sum the rows :-)

rowSums(mya / (mya / t(myv) + 1))
# [1]  7.980123 17.649590 26.809440
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • Thanks! The `purrr::pmap_dbl` solution is nice and quite fast! But if I am not wrong, such mapping is not really a vectorization, so does this mean that the vectorization of a "series" (as a list for example, as you point out) of vectors is not possible? – ztl Feb 02 '21 at 13:43
  • This is a vectorization of multiple vectors/lists. What makes you say that this mapping is not so? – r2evans Feb 02 '21 at 13:46
  • I guess I am being confused as I don't really know what is happening behing the mapping, I suspect some looping in there but my own suspicion is a bit abstract. And perhaps I am not the only one puzzled there -> https://stackoverflow.com/questions/28983292/is-the-apply-family-really-not-vectorized?answertab=votes#tab-top – ztl Feb 02 '21 at 14:58
  • Applying a user-defined function over every element of a vector/list will almost certainly require a `for` loop somewhere in the stack. So technically you are correct: none of the `*apply*` functions (in base R or `purrr`) do true vectorization, not like `1:10 * 2` is vectorized. – r2evans Feb 02 '21 at 15:01
  • I think I launched into a different direction with the first iteration of the answer. Take a look at the second part, something that works with *this* user-defined function logic. It is not generally applicable across all UDFs, but works when it is mathematically straight-forward. I hope the walk-through helped. (This will be significantly faster than any `*apply` variant.) – r2evans Feb 02 '21 at 15:15
  • Great! The second part of your question is really what I was looking for - thanks (especially as I had already accepted the answer - very nice)! Indeed faster and truly vectorized. In case useful to others: in my real case, I played in addition with `tcrossprod` and `sweep` to reach my goal. Leads to faster solution. [and in the real end, computing everything in C++ through `Rcpp` was the fastest - https://stackoverflow.com/questions/66006146/fastest-and-most-efficient-way-to-pass-rows-of-a-large-data-frame-as-arguments-t] – ztl Feb 03 '21 at 10:25