Why is vectorization faster

Question

I've been learning R for a while now, and have come across a lot of advice to programming types like myself to vectorize operations. Being a programmer, I'm interested as to why / how it's faster. An example:

n = 10^7
# populate with random nos
v=runif(n)
system.time({vv<-v*v; m<-mean(vv)}); m
system.time({for(i in 1:length(v)) { vv[i]<-v[i]*v[i] }; m<-mean(vv)}); m

This gave

   user  system elapsed 
   0.04    0.01    0.07 
[1] 0.3332091

   user  system elapsed 
  36.68    0.02   36.69 
[1] 0.3332091

The most obvious thing to consider is that we're running native code, i.e. machine code compiled from C or C++, rather than interpreted code, as shown by the massive difference in user time between the two examples (circa 3 orders of magnitude). But is there anything else going on? For example, does R do:

Cunning native data structures, e.g. clever ways of storing sparse vectors or matrices so that we only do multiplications when we need to?
Lazy evaluation, e.g. on a matrix multiply, don't evaluate cells until as and when you need to.
Parallel processing.
Something else.

To test whether there might be some sparse vector optimization I tried doing dot products with difference vector contents

# populate with random nos
v<-runif(n)
system.time({m<-v%*%v/n}); m
# populate with runs of 1 followed by 99 0s
v <-rep(rep(c(1,rep(0,99)),n/100))
system.time({m<-v%*%v/n}); m
# populate with 0s
v <-rep(0,n)
system.time({m<-v%*%v/n}); m

However there was no significant difference in time (circa 0.09 elapsed)

(Similar question for Matlab: Why does vectorized code run faster than for loops in MATLAB?)

score 11 · Accepted Answer · answered Jun 03 '13 at 18:07

11

The most obvious thing to consider is that we're running native code, i.e. machine code compiled from C or C++, rather than interpreted code

That's most of it. The other big-ish component is that since R code is functional in its design paradigm, functions (attempt to) have no side effects, which means that in some (but perhaps not all; R does try to be efficient about this) instances calling [<- in side a for loop results in having to copy the entire object. That can get slow.

A small side note: R does have rather extensive functionality for handling sparse matrix structures efficiently, but they aren't the "default".

answered Jun 03 '13 at 18:07

joran

169,992
32
429
468

Thanks, I didn't quite follow what you meant by functions in a loop, would you be able to provide a simple example or a reference (I'm interested generally in how the functional side of R affects things from a user's point of view -- so far everything I've done has been pretty procedural in style). – TooTone Jun 03 '13 at 19:07
@TooTone `x <- 1:10; tracemem(x); x[5] <- 1` would be the simplest example, probably. – joran Jun 03 '13 at 19:20
That's subtle. I had no idea that was happening (there is a [discussion thread](http://r.789695.n4.nabble.com/Why-is-vector-assignment-in-R-recreates-the-entire-vector-td2403402.html), which notes that R mostly optimizes these copies away, in case anyone else wants to look into this further). I've done some testing, and in my code with the `for` loop, `vv` is copied once, when `i==1`, but this has no impact on performance. – TooTone Jun 03 '13 at 20:18
@TooTone Yes, as I mentioned, R does try hard (and often succeeds) in minimizing this copying, but you can easily imagine more complex situations in writing a long code block in a for loop where R can't know ahead of time whether an object is going to need to be modified or not, or referenced in the future, and so copying becomes inevitable. – joran Jun 03 '13 at 20:23
@TooTone In you case, `vv` exists before the loop version, and hence storage is already allocated. You still incur a large overhead of all the function calls to `*` and `[<-` and 3 `[` per iteration. – Gavin Simpson Jun 03 '13 at 20:23
@Gavin Simpson thanks, you're right. For some reason I can't fathom, the first iteration `vv[1]<-v[1]*v[1]` causes a copy (I broke the loop up to check) and the second to last don't but this is a subtlety I can live with. – TooTone Jun 03 '13 at 20:43
@joran Apologies, you did indeed mention that. I really appreciate finding out about this subtlety as in a way what I was most worried about when I wrote the question was the last, "Something else" category, a.k.a. the "Unknown unknowns". – TooTone Jun 03 '13 at 20:45
@TooTone I personally would benefit considerably if you post an answer showing the code you used to do the above-mentioned testing and breaking-up of the loop to look for copies. Please consider doing so. Thank you either way. – Mark Miller Jun 03 '13 at 21:41
@MarkMiller no worries. I tried to add an answer but the SO warning box indicated that was _bad_ so I have added [another question](http://stackoverflow.com/questions/16906751/clarification-of-copying-array-semantics-in-r-on-assignment-to-array) instead, which I hope is the appropriate stackexchange protocol. A lot of this is new to me, so my question may illustrate that as opposed to anything useful to you; I hope it is the latter! – TooTone Jun 03 '13 at 22:22

score 8 · Answer 2 · answered Jun 03 '13 at 18:31

You are running both interpreted code and native code in both examples. The difference is that in the second you are doing the loop at the R level resulting in many more function calls that all need to be interpreted, and then the C code called. In your first example, the loop happens within the compiled code and hence R has far less to interpret, far fewer R code calls and far fewer calls to compiled code.

score 4 · Answer 3 · answered Jun 03 '13 at 18:05

4

In regard to parallel processing, out-of-the-box R does not do any parallel processing. Of course there is the built-in parallel package, but you have to adapt your code to using e.g. mclapply to use parallel processing. There are options to let your linear algebra be calculated in parallel using a special version of blas, but this is not standardly using in R, although getting it to work does not seems that hard.

answered Jun 03 '13 at 18:05

Paul Hiemstra

59,984
12
142
149

Thanks I guess parallel processing is one way in which Revolution Analytics R (love them or hate them) try to make a business. – TooTone Jun 03 '13 at 19:09

Why is vectorization faster

3 Answers3

Linked

Related