The question answers seem to be slightly overcomplicated.
Factorial is already a function which exists, and this is vectorized as such if you had some data you could simply put it into the function. If you want to define negative numbers to return 0 this could also be incorporated by using a logical statement. Note that i am using the buildin function factorial
below rather than the one in the question.
dat <- round(runif(1000, -10, 10))
dat_over_zero <- dat > 0
fact_vector <- numeric(1000)
fact_vector <- factorial(dat[dat_over_zero])
Now if you are simply creating an exercise to learn, you could vectorize the function quite simply, avoiding unnecessary for loops, using the same idea. Simply use one loop and iterate every element in the vector during this loop.
R_factorial <- function(x){
if(!is.numeric(x) || length(dim(x)))
stop("X must be a numeric vector!")
#create an output vector
output <- numeric(NROW(x))
#set initial value
output[x >= 1] <- 1
output[x < 1] <- NA
#Find the max factor (using only integer values, not gamma approximations)
mx <- max(round(x))
#Increment each output by multiplying the next factor (only on those which needs to be incremented)
for(i in seq(2, mx)){
output[x >= i] <- output[x >= i] * i
}
#return output
output
}
A few things to note:
- Allocate the entire vector first using
output <- numeric(length)
, where length is the number of outputs (eg. length(x)
here or more generally NROW(x)
).
- Use the R constant
NA
for none numeric values instead of "NA"
. The first is recognized as a number, while the latter will change your vector in a character vector.
Now the alternative answers suggest lapply or vapply. This is more or less the same as looping over every value in the vector and using the function on each value. As such it is often a slow (but very readable!) way to vectorize a function. If this can be avoided however you can often gain a speed boost. For loops and apply is not necessarily bad, but it is in general alot slower compared to vectorized functions. See this stackoverflow page which explains why in a very easily understood manner.
An additional alternative is using the Vectorize
function which has been suggested. This is a quick-and-dirty solution. In my experience it is often slower than performing a simple loop, and it might have some unexpected side effects on multiple argument functions. It is not necessarily bad as often one gains in readability of the underlying code.
Speed comparison
Now the vectorized version is a lot faster compared to the alternative answers. Using the microbenchmark
function from the microbenchmark
package, we can see exactly how much faster. Below shows just how much (Note here i am using the factorial function in the question description):
microbenchmark::microbenchmark(R_factorial = R_factorial(x),
Vapply = vapply(x,
factorial,
FUN.VALUE = numeric(1)),
Lapply = lapply(x, factorial),
Vfactorial = Vfactorial(x))
Unit: microseconds
expr min lq mean median uq max neval
R_factorial 186.525 197.287 232.2394 212.9565 241.464 395.706 100
Vapply 2209.982 2354.596 3004.9264 2428.7905 3842.265 6165.144 100
Lapply 2182.041 2299.092 2584.3881 2374.9855 2430.867 5061.852 100
Vfactorial(x) 2381.027 2505.4395 2842.9820 2595.3040 2669.310 5920.094 100
As one can see R_factorial is roughly 11 - 12 times faster compared to vapply or lapply (2428.8 / 212.96 = 11.4). This is quite a huge speed boost. Additional improvements could be done to speed it up even further (eg. using factorial approximation algorithms, Rcpp and other options), but for this example it might suffice.