If you have a larger number of columns your t(t(m) * v) solution outperforms the matrix multiplication solution by a wide margin. However, there is a faster solution, but it comes with a high cost in in memory usage. You create a matrix as large as m using rep() and multiply elementwise. Here's the comparison, modifying mnel's example:
m = matrix(rnorm(1200000), ncol=600)
v = rep(c(1.5, 3.5, 4.5, 5.5, 6.5, 7.5), length = ncol(m))
library(microbenchmark)
microbenchmark(t(t(m) * v),
m %*% diag(v),
m * rep(v, rep.int(nrow(m),length(v))),
m * rep(v, rep(nrow(m),length(v))),
m * rep(v, each = nrow(m)))
# Unit: milliseconds
# expr min lq mean median uq max neval
# t(t(m) * v) 17.682257 18.807218 20.574513 19.239350 19.818331 62.63947 100
# m %*% diag(v) 415.573110 417.835574 421.226179 419.061019 420.601778 465.43276 100
# m * rep(v, rep.int(nrow(m), ncol(m))) 2.597411 2.794915 5.947318 3.276216 3.873842 48.95579 100
# m * rep(v, rep(nrow(m), ncol(m))) 2.601701 2.785839 3.707153 2.918994 3.855361 47.48697 100
# m * rep(v, each = nrow(m)) 21.766636 21.901935 23.791504 22.351227 23.049006 66.68491 100
As you can see, using "each" in rep() sacrifices speed for clarity. The difference between rep.int and rep seems to be neglible, both implementations swap places on repeated runs of microbenchmark(). Keep in mind, that ncol(m) == length(v).
