Use sapply or Vector?

Question

I'm new to R and I find it ambiguous whether I should use a vector as the argument to call a function or use sapply instead to give the variables in the vector to the function one by one.

Isn't it the same thing? Why sapply exists if it's the same thing? Is there times when I should use one way or the other and how to know which way to use?

This question came to my mind because I was writing this

sapply(1:3, function(i) dnorm(i,0,1))

Then I accidentally discovered that I could do

dnorm(1:3,0,1)

How could I know it if I didn't discover it by accident (in order not to do the same mistake with other functions )?

Discovering that I tried to change in the same way this code

kappa <- c(1,2,3,4,5,6,7)

sapply(kappa, function(t) 
  optimize(function(x) (t*x^22+5*x+6), c(-10,10))$minimum)

to this

kappa <- c(1,2,3,4,5,6,7)

  optimize(function(x) (kappa*x^22+5*x+6), c(-10,10))$minimum

but it didn't work!

Please I need a good explanation.

Thank you

take a look at the function documentation. the first argument of `dnorm` is a vector, hence it is pointless to use `sapply`. on the other hand, the function `optimize` does not accept vector as argument — davide, Jan 25 '19 at 14:38

Parfait · Accepted Answer · 2019-01-25T15:01:21.477

Fundamentally, sapply, and similarly its siblings of the apply family, are loops to build a vector/matrix, or list from a multiple-item object. See this canonical answer on subject: Is the "*apply" family really not vectorized?. However, some operations are vectorized (i.e., loops are run at machine level such as in C or Fortran) and can receive a vector or list and operate in very quick runtime.

Almost always, the non-looped version will run faster. Below shows timings for a much larger sequence input.

system.time({sapply(1:300000, function(i) dnorm(i,0,1))})
#    user  system elapsed 
#   1.097   0.026   1.169

system.time({dnorm(1:300000,0,1)})
#    user  system elapsed 
#   0.006   0.001   0.007

As you found out dnorm is such a vectorized function. Many R functions can accept vectors or lists to return equal length outputs including paste, lengths, toupper, [, file.* family, as.* family, grep family. However, more complex, multi-layered operations require iterative calls to return single objects as you found out with optim. Other non-vectorized methods include read.csv, write.csv, merge, lm, glm, and summary. With these such methods, the apply family can then iteratively call them and bind all elements into a singular object such as vector/matrix or list.

kappa <- seq(1,7)

sapply(kappa, function(i) optimize(function(x) (i^x^2+5*x+6), c(-10,10))$minimum)
# [1] -9.9999263 -1.2407389 -0.9122106 -0.7784485 -0.7022782 -0.6517733 -0.6151620

score 1 · Answer 2 · answered Jan 25 '19 at 14:46

In general, when you have a vector, you should always use dnorm(1:3,0,1)-like syntax instead of sapply. It is simply faster and more elegant. The only exception is when the function you use is not vectorised (it is stated in the help page that the argument should be a single character/number or this is your own function which you know is not vectorised).

sapply is nice for lists:

> sapply(list(c(1:5), 5), sum)
[1] 15  5
> sum(list(c(1:5), 5))
Error in sum(list(c(1:5), 5)) : invalid 'type' (list) of argument

And apply for matrices:

> apply(matrix(1:4, 2, 2), 1, sum)
[1] 4 6
> apply(matrix(1:4, 2, 2), 2, sum)
[1] 3 7
> sum(matrix(1:4, 2, 2))
[1] 10

And as @davide said in the comment, optimize does not take vector as input.

Use sapply or Vector?

2 Answers2